Retryable vs Non-Retryable Errors: API Design & Client Generation Workflows

Defining Error Classifications in API Contracts

Establishing deterministic boundaries between transient infrastructure failures and permanent client-side violations is foundational to platform reliability. As part of a broader Error Contracts & Resilience Mapping strategy, API designers must explicitly classify error states to prevent cascading failures, optimize compute budgets, and standardize platform-wide handling policies.

OpenAPI 3.1 supports vendor extensions for this exact use case. Apply x-retryable directly to response objects to drive downstream code generation.

OpenAPI 3.1 Specification Snippet

paths:
 /v1/transactions:
 post:
 responses:
 '503':
 description: Service temporarily unavailable
 headers:
 Retry-After:
 schema:
 type: integer
 content:
 application/problem+json:
 schema:
 $ref: '#/components/schemas/ProblemDetails'
 x-retryable: true
 '400':
 description: Invalid request payload
 x-retryable: false

CI Linting Enforcement Enforce consistent tagging across all endpoint definitions using Spectral. Add the following rule to your .spectral.yaml:

rules:
 enforce-retryable-classification:
 description: All 5xx responses must explicitly declare x-retryable status
 severity: error
 given: "$.paths[*][*].responses['5*']"
 then:
 field: "x-retryable"
 function: truthy

Run validation in CI:

npm install -g @stoplight/spectral-cli
spectral lint openapi.yaml --ruleset .spectral.yaml

HTTP Status Code Mapping & Semantic Boundaries

Deterministic retry policies rely on strict adherence to the HTTP Status Code Mapping framework. Clients must distinguish between safe-to-retry states and terminal responses to avoid credential lockouts, wasted cycles, and duplicate resource creation.

Status Code	Classification	Retry Policy	Client Action
`408`, `429`	Transient	Conditional	Backoff + `Retry-After` parsing
`502`, `503`, `504`	Transient	Automatic	Jittered exponential backoff
`400`, `401`, `403`, `404`, `501`	Terminal	None	Fail fast, surface to user
`409`	State Conflict	Manual	Reconcile state before retry

Automated Matrix Validation Prevent ambiguous 2xx/5xx overlaps by enforcing response schema boundaries in CI. Use openapi-validator with a custom JSON Schema assertion:

# Validate that no endpoint mixes success and error schemas under the same status code
npx @redocly/cli lint openapi.yaml --extends recommended --ruleset ./matrix-rules.yaml

Ensure your OpenAPI responses block explicitly defines content types per status code. Overlapping schemas trigger build failures.

Structuring Error Payloads for Machine Readability

Machine-readable error payloads eliminate fragile client-side string parsing. Aligning with RFC 7807 Problem+JSON Implementation standardizes retry_after_ms, error_type, and trace_id fields for automated client parsing and observability correlation.

JSON Schema for Extended Problem Details

components:
 schemas:
 ProblemDetails:
 type: object
 required: [type, title, status]
 properties:
 type: { type: string, format: uri }
 title: { type: string }
 status: { type: integer }
 detail: { type: string }
 instance: { type: string, format: uri }
 trace_id: { type: string, format: uuid }
 retryable: { type: boolean }
 backoff_strategy:
 type: string
 enum: [exponential, linear, fixed, none]
 retry_after_ms: { type: integer, minimum: 0 }

Contract Testing in CI Validate that actual API responses match the generated SDK types before merging:

# Run contract tests against staging environment
npx openapi-test validate \
 --spec openapi.yaml \
 --endpoint https://api.staging.internal/v1/transactions \
 --method POST \
 --expect-status 503 \
 --assert-json-path "$.retryable == true" \
 --assert-json-path "$.backoff_strategy == 'exponential'"

Automating Client Generation & Retry Workflows

Validated OpenAPI specs should drive type-safe SDK generation with built-in retry interceptors. By integrating Configuring exponential backoff for 5xx errors into CI/CD pipelines, teams can ship production-grade resilience without manual boilerplate.

OpenAPI Generator Configuration Create openapi-generator-config.yaml to inject custom retry middleware during codegen:

generatorName: typescript-axios
outputDir: ./clients/ts-sdk
additionalProperties:
 supportsES6: true
 withSeparateModelsAndApi: true
 apiPackage: api
 modelPackage: models
 npmName: "@platform/transaction-client"
templateDir: ./custom-templates

Language-Specific Retry Implementations

TypeScript (Axios Interceptor)

import { AxiosError } from 'axios';
import { retry } from 'async-retry';

export const retryInterceptor = async (error: AxiosError) => {
 const config = error.config;
 const isRetryable = error.response?.headers['x-retryable'] === 'true' ||
 [502, 503, 504, 429].includes(error.response?.status);

 if (!isRetryable || !config) throw error;

 const retryAfter = parseInt(error.response?.headers['retry-after'] || '1000', 10);
 return retry(async () => {
 const res = await axios(config);
 return res;
 }, { retries: 3, minTimeout: retryAfter, factor: 2, randomize: true });
};

Python (httpx Custom Transport)

import httpx
import time
import random

class RetryableTransport(httpx.AsyncBaseTransport):
 def __init__(self, base_transport, max_retries=3):
 self.base = base_transport
 self.max_retries = max_retries

 async def handle_async_request(self, request):
 for attempt in range(self.max_retries + 1):
 response = await self.base.handle_async_request(request)
 if response.status_code in (502, 503, 504, 429) or response.headers.get("X-Retryable") == "true":
 delay = int(response.headers.get("Retry-After", 1000)) * (2 ** attempt) + random.uniform(0, 1000)
 await anyio.sleep(delay / 1000)
 continue
 return response
 raise httpx.RequestError("Max retries exceeded")

Go (http.RoundTripper Middleware)

type RetryRoundTripper struct {
 Transport http.RoundTripper
 MaxRetries int
}

func (rt *RetryRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
 for i := 0; i <= rt.MaxRetries; i++ {
 resp, err := rt.Transport.RoundTrip(req)
 if err != nil || resp.StatusCode >= 500 || resp.StatusCode == 429 {
 if retryable, _ := strconv.ParseBool(resp.Header.Get("X-Retryable")); retryable || resp.StatusCode >= 500 {
 delay := time.Duration(1000 * int(math.Pow(2, float64(i)))) * time.Millisecond
 time.Sleep(delay)
 continue
 }
 }
 return resp, err
 }
 return nil, fmt.Errorf("retry limit exceeded")
}

CI/CD Pipeline Workflow (GitHub Actions)

name: Generate & Publish SDKs
on:
push:
paths: ['openapi.yaml']
jobs:
  build-sdks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Validate OpenAPI Spec
        run: npx @redocly/cli lint openapi.yaml
      - name: Generate TypeScript Client
        run: openapi-generator-cli generate -i openapi.yaml -g typescript-axios -c openapi-generator-config.yaml
      - name: Run Contract Tests
        run: npm run test:contracts
      - name: Publish to Registry
        run: npm publish --access public
        env:
          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

Common Implementation Pitfalls

Blindly retrying 4xx errors: Automatically retrying 401 Unauthorized or 403 Forbidden triggers credential lockouts and wastes compute cycles.
Missing idempotency keys: Retryable POST/PUT endpoints without Idempotency-Key headers cause duplicate resource creation during network partitions.
Ignoring Retry-After headers: Implementing fixed-delay loops violates rate limits and accelerates service degradation.
Untyped error discrimination: Generating SDKs without structured error enums forces developers to parse raw strings, breaking compile-time safety.
Unbounded retry loops: Omitting circuit breakers during partial outages triggers thundering herd effects, overwhelming recovering services.

Frequently Asked Questions

How do I enforce retryable vs non-retryable classifications in CI/CD?

Use OpenAPI linting rules (e.g., Spectral) to validate that all 5xx responses include x-retryable: true and Problem+JSON payloads contain machine-readable retry metadata. Fail builds on missing idempotency headers for retryable mutations.

Should 408 Request Timeout and 409 Conflict be retried automatically?

Yes, but with conditional logic. 408 is inherently retryable with backoff. 409 requires conflict resolution or state reconciliation before retry; SDKs should expose these as distinct error types rather than blanket retries.

How do I generate type-safe clients that automatically handle retry policies?

Extend OpenAPI Generator templates to parse x-retryable and Retry-After headers. Map them to typed retry interceptors (e.g., TypeScript RetryConfig, Go backoff.Backoff) during codegen, ensuring compile-time safety for error handling.

What is the safest way to test retry logic before production deployment?

Implement contract testing with mock servers that simulate transient failures (503, 429) and inject Retry-After delays. Validate SDK behavior in CI using deterministic backoff assertions and idempotency verification.