Retryable vs Non-Retryable Errors: API Design & Client Generation Workflows
Defining Error Classifications in API Contracts
Establishing deterministic boundaries between transient infrastructure failures and permanent client-side violations is foundational to platform reliability. As part of a broader Error Contracts & Resilience Mapping strategy, API designers must explicitly classify error states to prevent cascading failures, optimize compute budgets, and standardize platform-wide handling policies.
OpenAPI 3.1 supports vendor extensions for this exact use case. Apply x-retryable directly to response objects to drive downstream code generation.
OpenAPI 3.1 Specification Snippet
paths:
/v1/transactions:
post:
responses:
'503':
description: Service temporarily unavailable
headers:
Retry-After:
schema:
type: integer
content:
application/problem+json:
schema:
$ref: '#/components/schemas/ProblemDetails'
x-retryable: true
'400':
description: Invalid request payload
x-retryable: false
CI Linting Enforcement
Enforce consistent tagging across all endpoint definitions using Spectral. Add the following rule to your .spectral.yaml:
rules:
enforce-retryable-classification:
description: All 5xx responses must explicitly declare x-retryable status
severity: error
given: "$.paths[*][*].responses['5*']"
then:
field: "x-retryable"
function: truthy
Run validation in CI:
npm install -g @stoplight/spectral-cli
spectral lint openapi.yaml --ruleset .spectral.yaml
HTTP Status Code Mapping & Semantic Boundaries
Deterministic retry policies rely on strict adherence to the HTTP Status Code Mapping framework. Clients must distinguish between safe-to-retry states and terminal responses to avoid credential lockouts, wasted cycles, and duplicate resource creation.
| Status Code | Classification | Retry Policy | Client Action |
|---|---|---|---|
408, 429 |
Transient | Conditional | Backoff + Retry-After parsing |
502, 503, 504 |
Transient | Automatic | Jittered exponential backoff |
400, 401, 403, 404, 501 |
Terminal | None | Fail fast, surface to user |
409 |
State Conflict | Manual | Reconcile state before retry |
Automated Matrix Validation
Prevent ambiguous 2xx/5xx overlaps by enforcing response schema boundaries in CI. Use openapi-validator with a custom JSON Schema assertion:
# Validate that no endpoint mixes success and error schemas under the same status code
npx @redocly/cli lint openapi.yaml --extends recommended --ruleset ./matrix-rules.yaml
Ensure your OpenAPI responses block explicitly defines content types per status code. Overlapping schemas trigger build failures.
Structuring Error Payloads for Machine Readability
Machine-readable error payloads eliminate fragile client-side string parsing. Aligning with RFC 7807 Problem+JSON Implementation standardizes retry_after_ms, error_type, and trace_id fields for automated client parsing and observability correlation.
JSON Schema for Extended Problem Details
components:
schemas:
ProblemDetails:
type: object
required: [type, title, status]
properties:
type: { type: string, format: uri }
title: { type: string }
status: { type: integer }
detail: { type: string }
instance: { type: string, format: uri }
trace_id: { type: string, format: uuid }
retryable: { type: boolean }
backoff_strategy:
type: string
enum: [exponential, linear, fixed, none]
retry_after_ms: { type: integer, minimum: 0 }
Contract Testing in CI Validate that actual API responses match the generated SDK types before merging:
# Run contract tests against staging environment
npx openapi-test validate \
--spec openapi.yaml \
--endpoint https://api.staging.internal/v1/transactions \
--method POST \
--expect-status 503 \
--assert-json-path "$.retryable == true" \
--assert-json-path "$.backoff_strategy == 'exponential'"
Automating Client Generation & Retry Workflows
Validated OpenAPI specs should drive type-safe SDK generation with built-in retry interceptors. By integrating Configuring exponential backoff for 5xx errors into CI/CD pipelines, teams can ship production-grade resilience without manual boilerplate.
OpenAPI Generator Configuration
Create openapi-generator-config.yaml to inject custom retry middleware during codegen:
generatorName: typescript-axios
outputDir: ./clients/ts-sdk
additionalProperties:
supportsES6: true
withSeparateModelsAndApi: true
apiPackage: api
modelPackage: models
npmName: "@platform/transaction-client"
templateDir: ./custom-templates
Language-Specific Retry Implementations
TypeScript (Axios Interceptor)
import { AxiosError } from 'axios';
import { retry } from 'async-retry';
export const retryInterceptor = async (error: AxiosError) => {
const config = error.config;
const isRetryable = error.response?.headers['x-retryable'] === 'true' ||
[502, 503, 504, 429].includes(error.response?.status);
if (!isRetryable || !config) throw error;
const retryAfter = parseInt(error.response?.headers['retry-after'] || '1000', 10);
return retry(async () => {
const res = await axios(config);
return res;
}, { retries: 3, minTimeout: retryAfter, factor: 2, randomize: true });
};
Python (httpx Custom Transport)
import httpx
import time
import random
class RetryableTransport(httpx.AsyncBaseTransport):
def __init__(self, base_transport, max_retries=3):
self.base = base_transport
self.max_retries = max_retries
async def handle_async_request(self, request):
for attempt in range(self.max_retries + 1):
response = await self.base.handle_async_request(request)
if response.status_code in (502, 503, 504, 429) or response.headers.get("X-Retryable") == "true":
delay = int(response.headers.get("Retry-After", 1000)) * (2 ** attempt) + random.uniform(0, 1000)
await anyio.sleep(delay / 1000)
continue
return response
raise httpx.RequestError("Max retries exceeded")
Go (http.RoundTripper Middleware)
type RetryRoundTripper struct {
Transport http.RoundTripper
MaxRetries int
}
func (rt *RetryRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
for i := 0; i <= rt.MaxRetries; i++ {
resp, err := rt.Transport.RoundTrip(req)
if err != nil || resp.StatusCode >= 500 || resp.StatusCode == 429 {
if retryable, _ := strconv.ParseBool(resp.Header.Get("X-Retryable")); retryable || resp.StatusCode >= 500 {
delay := time.Duration(1000 * int(math.Pow(2, float64(i)))) * time.Millisecond
time.Sleep(delay)
continue
}
}
return resp, err
}
return nil, fmt.Errorf("retry limit exceeded")
}
CI/CD Pipeline Workflow (GitHub Actions)
name: Generate & Publish SDKs
on:
push:
paths: ['openapi.yaml']
jobs:
build-sdks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate OpenAPI Spec
run: npx @redocly/cli lint openapi.yaml
- name: Generate TypeScript Client
run: openapi-generator-cli generate -i openapi.yaml -g typescript-axios -c openapi-generator-config.yaml
- name: Run Contract Tests
run: npm run test:contracts
- name: Publish to Registry
run: npm publish --access public
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
Common Implementation Pitfalls
- Blindly retrying
4xxerrors: Automatically retrying401 Unauthorizedor403 Forbiddentriggers credential lockouts and wastes compute cycles. - Missing idempotency keys: Retryable
POST/PUTendpoints withoutIdempotency-Keyheaders cause duplicate resource creation during network partitions. - Ignoring
Retry-Afterheaders: Implementing fixed-delay loops violates rate limits and accelerates service degradation. - Untyped error discrimination: Generating SDKs without structured error enums forces developers to parse raw strings, breaking compile-time safety.
- Unbounded retry loops: Omitting circuit breakers during partial outages triggers thundering herd effects, overwhelming recovering services.
Frequently Asked Questions
How do I enforce retryable vs non-retryable classifications in CI/CD?
Use OpenAPI linting rules (e.g., Spectral) to validate that all 5xx responses include x-retryable: true and Problem+JSON payloads contain machine-readable retry metadata. Fail builds on missing idempotency headers for retryable mutations.
Should 408 Request Timeout and 409 Conflict be retried automatically?
Yes, but with conditional logic. 408 is inherently retryable with backoff. 409 requires conflict resolution or state reconciliation before retry; SDKs should expose these as distinct error types rather than blanket retries.
How do I generate type-safe clients that automatically handle retry policies?
Extend OpenAPI Generator templates to parse x-retryable and Retry-After headers. Map them to typed retry interceptors (e.g., TypeScript RetryConfig, Go backoff.Backoff) during codegen, ensuring compile-time safety for error handling.
What is the safest way to test retry logic before production deployment?
Implement contract testing with mock servers that simulate transient failures (503, 429) and inject Retry-After delays. Validate SDK behavior in CI using deterministic backoff assertions and idempotency verification.