Query Patterns & Data Shaping Strategies

Unbounded queries, implicit field expansion, and unstable sort keys are the leading causes of latency regressions and breaking contract changes in production REST APIs. This reference covers contract-first design for data retrieval — pagination strategies, filter validation, multi-field ordering, and response projection — targeting backend engineers, platform architects, and developer advocacy teams who own the API contract.

Architecture Overview

Every data retrieval pattern in this section starts from a single principle: query parameters are contract elements, not implementation details. When pagination tokens, filter operators, and sort keys are defined in the OpenAPI spec rather than inferred from implementation, platform teams can enforce them at the gateway, generate type-safe SDKs, and gate merges in CI before a single byte reaches a database.

The four topics covered here map directly to four decision points in a query contract:

Offset vs Cursor Pagination — choosing the pagination strategy based on data volatility and index topology, and encoding that choice in spec rather than documentation.
Advanced Filtering Operators — defining operator allowlists, depth limits, and index coverage requirements as JSON Schema constraints.
Sorting & Multi-Field Ordering — enforcing composite sort keys and stable defaults so pagination tokens remain valid across concurrent writes.
Sparse Fieldsets & Projection — exposing client-controlled field selection as an explicit allowlist, not an open pass-through to internal schema.

The diagram below shows how these four concerns compose into a single request lifecycle — from spec validation at the gateway through query planning to serialized response.

This lifecycle makes the contract the enforcement point, not the application code. A malformed filter or an undeclared sort field is rejected at the gateway with a 400 before reaching a database index.

Canonical OpenAPI Spec Definition

The following OpenAPI 3.1 fragment is the authoritative starting point for a query contract. Each field is annotated with the decision it enforces.

# openapi.yaml — query parameter contract for a collection endpoint
paths:
  /v1/resources:
    get:
      operationId: listResources
      parameters:
        - name: limit
          in: query
          description: Page size — capped server-side regardless of client value.
          schema:
            type: integer
            minimum: 1
            maximum: 100
            default: 20
          # x-pagination tells SDK generators which style to wire up.
          x-pagination:
            type: cursor
            token_param: next_cursor

        - name: next_cursor
          in: query
          description: >
            Opaque base64url-encoded composite cursor. Treat as a black box —
            do not parse or construct manually.
          schema:
            type: string
            format: byte

        - name: filter
          in: query
          description: >
            JSON-encoded filter object. Max depth: 2. Allowed operators:
            $eq, $gt, $lt, $in. Max top-level keys: 8.
          schema:
            type: string
            format: json
          x-filter:
            max_depth: 2
            allowed_operators: [eq, gt, lt, in]
            max_properties: 8

        - name: sort
          in: query
          description: >
            Comma-separated sort fields. Always include a unique tiebreaker
            (id) to guarantee deterministic ordering for cursor stability.
          schema:
            type: string
            pattern: '^[a-z_]+:(asc|desc)(,[a-z_]+:(asc|desc))*$'
            default: "created_at:desc,id:asc"

        - name: fields
          in: query
          description: >
            Comma-separated projection allowlist. Omitting this parameter
            returns only the default field set, not all fields.
          schema:
            type: string
            example: "id,name,status,created_at"

      responses:
        "200":
          description: Paginated resource collection
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/CollectionResponse"
        "400":
          description: Invalid query parameter (malformed filter, unknown field, depth exceeded)
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/ProblemDetails"

components:
  schemas:
    CollectionResponse:
      type: object
      required: [data, meta]
      properties:
        data:
          type: array
          items:
            $ref: "#/components/schemas/Resource"
        meta:
          type: object
          required: [next_cursor, total_count_approximate]
          properties:
            next_cursor:
              type: string
              format: byte
              nullable: true
              description: Null when no further pages exist.
            total_count_approximate:
              type: integer
              description: >
                Approximate count for UI display. Do not use for
                offset arithmetic — use next_cursor for traversal.

This spec anchors every section that follows. The x-pagination and x-filter vendor extensions are consumed by SDK generators and Spectral rules; they are not runtime-only annotations.

Core Pattern 1: Pagination Strategy Selection

The choice between offset-based and cursor-based pagination is a contract decision, not an implementation preference. Offset pagination (page=3&per_page=20) is easy to implement but produces skipped records and duplicate rows when concurrent inserts occur during traversal. Cursor pagination encodes a position in the index, making each page fetch independent of concurrent mutations.

The decision matrix:

Scenario	Offset	Cursor
Total page count displayed in UI	Yes	No — use approximate count only
Dataset mutates during traversal	Unsafe — gaps/dupes	Safe — position-stable
Random page access required	Yes (`page=42`)	No — sequential only
Dataset > 10 k rows	Full scan on high pages	Index seek at constant cost
Sort key is unique	Works	Required

Node.js — cursor generation and decoding:

import { createHmac } from "crypto";

interface CursorPayload {
  created_at: string; // ISO-8601
  id: string;         // UUID — unique tiebreaker
}

const CURSOR_SECRET = process.env.CURSOR_HMAC_SECRET!;

export function encodeCursor(payload: CursorPayload): string {
  const json = JSON.stringify(payload);
  const hmac = createHmac("sha256", CURSOR_SECRET)
    .update(json)
    .digest("hex")
    .slice(0, 16);
  return Buffer.from(`${json}.${hmac}`).toString("base64url");
}

export function decodeCursor(token: string): CursorPayload {
  const raw = Buffer.from(token, "base64url").toString("utf8");
  const lastDot = raw.lastIndexOf(".");
  const json = raw.slice(0, lastDot);
  const providedHmac = raw.slice(lastDot + 1);
  const expectedHmac = createHmac("sha256", CURSOR_SECRET)
    .update(json)
    .digest("hex")
    .slice(0, 16);
  if (providedHmac !== expectedHmac) {
    throw new Error("Invalid cursor token");
  }
  return JSON.parse(json) as CursorPayload;
}

Python — SQLAlchemy query with cursor seek:

from sqlalchemy import select, and_, or_
from datetime import datetime
from uuid import UUID

def list_resources(
    db,
    limit: int = 20,
    cursor: dict | None = None,
) -> tuple[list, str | None]:
    stmt = select(Resource).order_by(
        Resource.created_at.desc(),
        Resource.id.asc(),   # stable tiebreaker
    ).limit(limit + 1)

    if cursor:
        # Keyset seek: (created_at, id) > cursor values
        stmt = stmt.where(
            or_(
                Resource.created_at < cursor["created_at"],
                and_(
                    Resource.created_at == cursor["created_at"],
                    Resource.id > cursor["id"],
                ),
            )
        )

    rows = db.execute(stmt).scalars().all()
    next_cursor = None
    if len(rows) > limit:
        rows = rows[:limit]
        last = rows[-1]
        next_cursor = encode_cursor({
            "created_at": last.created_at.isoformat(),
            "id": str(last.id),
        })

    return rows, next_cursor

The limit + 1 fetch pattern determines whether a next page exists without a separate COUNT(*) query.

Core Pattern 2: Filter Validation and Constraint Boundaries

Unbounded filter combinations trigger full table scans, exhaust connection pools, and introduce N+1 execution risks. Advanced filtering operators must be validated against a JSON Schema allowlist before the query reaches an ORM or query builder.

JSON Schema filter validator:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "filter": {
      "type": "object",
      "maxProperties": 8,
      "additionalProperties": {
        "anyOf": [
          { "type": "string" },
          { "type": "number" },
          { "type": "boolean" },
          {
            "type": "object",
            "properties": {
              "$eq":  { "type": "string" },
              "$gt":  { "type": "number" },
              "$lt":  { "type": "number" },
              "$in":  {
                "type": "array",
                "items": { "type": "string" },
                "minItems": 1,
                "maxItems": 50
              }
            },
            "additionalProperties": false
          }
        ]
      }
    }
  },
  "required": ["filter"]
}

Node.js middleware — depth enforcement and index coverage check:

import Ajv from "ajv";
import { filterSchema } from "./filter-schema";

const ajv = new Ajv({ allErrors: true });
const validateFilter = ajv.compile(filterSchema);

// Fields that lack a database index — reject at middleware, not at query time.
const INDEXED_FIELDS = new Set(["status", "region", "created_at", "id", "owner_id"]);

export function filterMiddleware(req, res, next) {
  let parsed: unknown;
  try {
    parsed = JSON.parse(req.query.filter ?? "{}");
  } catch {
    return res.status(400).json({ type: "/errors/invalid-filter-json" });
  }

  if (!validateFilter({ filter: parsed })) {
    return res.status(400).json({
      type: "/errors/filter-validation-failed",
      errors: ajv.errorsText(validateFilter.errors),
    });
  }

  const unknownFields = Object.keys(parsed as object).filter(
    (k) => !INDEXED_FIELDS.has(k)
  );
  if (unknownFields.length > 0) {
    return res.status(400).json({
      type: "/errors/unindexed-filter-field",
      detail: `Filter fields not indexed: ${unknownFields.join(", ")}`,
    });
  }

  req.validatedFilter = parsed;
  next();
}

Rejecting non-indexed fields at the middleware layer prevents accidental sequential scans on large tables that would otherwise surface as P99 latency spikes rather than validation errors.

Core Pattern 3: Deterministic Ordering and Sort Contracts

Sorting & multi-field ordering is the prerequisite for any cursor-based pagination scheme. A sort contract has two requirements: the sort expression must be parseable from the OpenAPI spec (enabling SDK generation), and the default must include a unique tiebreaker so pagination tokens remain stable during concurrent inserts.

Node.js — parse and validate the sort query parameter:

interface SortField {
  field: string;
  direction: "asc" | "desc";
}

const SORTABLE_FIELDS = new Set([
  "created_at", "updated_at", "name", "status", "id",
]);

export function parseSortParam(raw: string): SortField[] {
  const parts = raw.split(",").map((s) => s.trim());
  return parts.map((part) => {
    const [field, dir] = part.split(":");
    if (!SORTABLE_FIELDS.has(field)) {
      throw new Error(`Unknown sort field: ${field}`);
    }
    if (dir !== "asc" && dir !== "desc") {
      throw new Error(`Invalid sort direction: ${dir}`);
    }
    return { field, direction: dir as "asc" | "desc" };
  });
}

// Ensure the last field is always the unique tiebreaker.
export function enforceTiebreaker(fields: SortField[]): SortField[] {
  const last = fields[fields.length - 1];
  if (last?.field !== "id") {
    return [...fields, { field: "id", direction: "asc" }];
  }
  return fields;
}

Python — build a SQLAlchemy ORDER BY from parsed sort fields:

from sqlalchemy import asc, desc

COLUMN_MAP = {
    "created_at": Resource.created_at,
    "updated_at": Resource.updated_at,
    "name": Resource.name,
    "status": Resource.status,
    "id": Resource.id,
}

def apply_sort(stmt, sort_fields: list[dict]) -> object:
    order_clauses = []
    for sf in sort_fields:
        col = COLUMN_MAP[sf["field"]]
        clause = asc(col) if sf["direction"] == "asc" else desc(col)
        order_clauses.append(clause)
    return stmt.order_by(*order_clauses)

Never allow arbitrary column names through without allowlist validation. Unvalidated sort parameters enable column enumeration and, in some ORMs, injection via identifier quoting.

Core Pattern 4: Sparse Fieldsets and Response Projection

Over-fetching degrades serialization performance and increases payload size. Sparse fieldsets and projection expose client-controlled field selection as an explicit allowlist, not an open pass-through to internal schema columns.

Allowlist validation and ORM push-down (Node.js / Prisma):

const PROJECTABLE_FIELDS = new Set([
  "id", "name", "status", "region", "owner_id", "created_at", "updated_at",
]);

// Internal fields that must never be projected to clients.
const INTERNAL_FIELDS = new Set([
  "password_hash", "internal_score", "billing_state",
]);

export function parseFieldsParam(raw: string): string[] {
  const requested = raw.split(",").map((f) => f.trim());
  const unknown = requested.filter(
    (f) => !PROJECTABLE_FIELDS.has(f) && !INTERNAL_FIELDS.has(f)
  );
  if (unknown.length > 0) {
    throw new ValidationError(`Unknown projection fields: ${unknown.join(", ")}`);
  }
  const forbidden = requested.filter((f) => INTERNAL_FIELDS.has(f));
  if (forbidden.length > 0) {
    throw new ForbiddenError(`Fields not available for projection: ${forbidden.join(", ")}`);
  }
  return requested;
}

// Push projection down into Prisma select — avoids fetching unused columns.
export function buildPrismaSelect(fields: string[]): Record<string, true> {
  return Object.fromEntries(fields.map((f) => [f, true]));
}

// Usage:
const fields = parseFieldsParam(req.query.fields ?? "id,name,status");
const resources = await prisma.resource.findMany({
  select: buildPrismaSelect(fields),
  where: req.validatedFilter,
  orderBy: req.validatedSort,
  take: req.validatedLimit + 1,
});

Zod schema for tRPC or typed Express routes:

import { z } from "zod";

const FIELD_ENUM = z.enum([
  "id", "name", "status", "region", "owner_id", "created_at", "updated_at",
]);

export const listQuerySchema = z.object({
  limit: z.coerce.number().int().min(1).max(100).default(20),
  next_cursor: z.string().optional(),
  filter: z.string().optional(),  // validated separately via AJV
  sort: z
    .string()
    .regex(/^[a-z_]+:(asc|desc)(,[a-z_]+:(asc|desc))*$/)
    .default("created_at:desc,id:asc"),
  fields: z
    .string()
    .transform((val) => val.split(",").map((f) => f.trim()))
    .pipe(z.array(FIELD_ENUM).min(1).max(10))
    .default("id,name,status"),
});

The Zod schema is the single source of truth for both server validation and TypeScript types in generated clients — changes to the allowed field enum surface as compile-time errors in consuming code.

CI/CD Enforcement

Query contract regressions — removed sort defaults, widened filter operators, changed cursor serialization — must be caught in CI before reaching staging. The following GitHub Actions workflow enforces the spec-level constraints using Spectral and validates the pagination contract.

# .github/workflows/query-contract-ci.yml
name: Query Contract CI
on:
  pull_request:
    paths:
      - "openapi/**"
      - "src/middleware/filter*"
      - "src/middleware/sort*"
      - "src/middleware/fields*"

jobs:
  spectral-lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Spectral
        run: npm install -g @stoplight/spectral-cli

      - name: Run Spectral with query ruleset
        run: spectral lint openapi/api.yaml --ruleset .spectral.query.yaml

  pagination-contract:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Verify stable default sort key
        run: |
          yq eval \
            '.paths.*.get.parameters[] | select(.name == "sort") | .schema.default' \
            openapi/api.yaml | \
          grep -qE 'created_at:(asc|desc),id:(asc|desc)' || \
          (echo "ERROR: Default sort key missing unique tiebreaker" && exit 1)

      - name: Verify cursor parameter is present
        run: |
          yq eval \
            '.paths.*.get.parameters[] | select(.name == "next_cursor") | .name' \
            openapi/api.yaml | \
          grep -q 'next_cursor' || \
          (echo "ERROR: next_cursor parameter missing from collection endpoint" && exit 1)

      - name: Validate OpenAPI spec
        run: |
          npx @redocly/cli lint openapi/api.yaml --extends recommended

Spectral ruleset for query constraints:

# .spectral.query.yaml
extends: ["spectral:oas"]
rules:
  filter-operator-allowlist:
    description: "x-filter vendor extension must declare allowed_operators"
    given: "$.paths.*.get.parameters[?(@.name == 'filter')]"
    severity: error
    then:
      field: "x-filter.allowed_operators"
      function: truthy

  sort-default-tiebreaker:
    description: "Default sort must end with id:(asc|desc)"
    given: "$.paths.*.get.parameters[?(@.name == 'sort')].schema"
    severity: error
    then:
      field: "default"
      function: pattern
      functionOptions:
        match: ".*,id:(asc|desc)$"

  cursor-format-byte:
    description: "Cursor token must use format: byte"
    given: "$.paths.*.get.parameters[?(@.name == 'next_cursor')].schema"
    severity: error
    then:
      field: "format"
      function: enumeration
      functionOptions:
        values: ["byte"]

SDK and Client Impact

Query contract decisions made in the OpenAPI spec surface directly in generated client code. When x-pagination: {type: cursor} is present, generators like openapi-typescript-codegen and openapi-generator can emit typed cursor iterators rather than raw parameter objects.

TypeScript — generated cursor iterator (openapi-typescript-codegen output):

// Auto-generated by openapi-typescript-codegen from the spec above.
// Do not edit manually — changes must go through the OpenAPI spec.
export async function* listResourcesPages(
  client: ApiClient,
  query: Omit<ListResourcesQuery, "next_cursor">
): AsyncGenerator<Resource[]> {
  let cursor: string | undefined;
  do {
    const response = await client.resources.list({ ...query, next_cursor: cursor });
    yield response.data;
    cursor = response.meta.next_cursor ?? undefined;
  } while (cursor);
}

Python — Pydantic-validated query model (generated from spec):

from pydantic import BaseModel, Field, field_validator
from typing import Optional

class ListResourcesQuery(BaseModel):
    limit: int = Field(20, ge=1, le=100)
    next_cursor: Optional[str] = None
    filter: Optional[str] = None  # JSON-encoded, validated server-side
    sort: str = Field("created_at:desc,id:asc", pattern=r"^[a-z_]+:(asc|desc)(,[a-z_]+:(asc|desc))*$")
    fields: str = "id,name,status"

    @field_validator("fields")
    @classmethod
    def validate_fields(cls, v: str) -> str:
        allowed = {"id", "name", "status", "region", "owner_id", "created_at", "updated_at"}
        requested = set(v.split(","))
        unknown = requested - allowed
        if unknown:
            raise ValueError(f"Unknown projection fields: {unknown}")
        return v

Go — compile-time URL serialization:

// github.com/google/go-querystring is the canonical approach.
type ListResourcesQuery struct {
    Limit      int    `url:"limit,omitempty"`
    NextCursor string `url:"next_cursor,omitempty"`
    Filter     string `url:"filter,omitempty"`
    Sort       string `url:"sort,omitempty"`
    Fields     string `url:"fields,omitempty"`
}

// Encode produces: limit=20&sort=created_at%3Adesc%2Cid%3Aasc
encoded, _ := query.Values(ListResourcesQuery{
    Limit:  20,
    Sort:   "created_at:desc,id:asc",
    Fields: "id,name,status",
})

SDK versioning must bump the minor version when new fields enum values are added, and the major version when cursor serialization format changes — since existing stored tokens become invalid.

Edge Cases and Anti-Patterns

Anti-pattern	Recommended approach
Offset pagination on mutable, high-cardinality tables	Use keyset/cursor pagination with a stable composite sort key
Accepting arbitrary filter fields without an index check	Validate filter keys against an indexed-field allowlist at middleware
Sort parameter without a unique tiebreaker	Always append `,id:asc` or `,id:desc` as the final sort field
Returning all fields by default when `fields` is omitted	Default to a minimal safe fieldset; require explicit opt-in for sensitive fields
Parsing cursor tokens client-side	Treat cursors as opaque; sign with HMAC and reject tampered tokens
Using `GET` with a query string > 2 KB for complex filters	Accept `POST /resources/query` with a JSON body when filter complexity exceeds URL limits
Exposing total row count via `COUNT(*)` on every request	Return an approximate count from table statistics; use cursor presence to indicate more pages
Allowing `sort` on non-indexed columns	Enforce a sort-field allowlist tied to database index definitions

FAQ

How do query patterns impact OpenAPI client generation?

Standardized query contracts — especially vendor extensions like x-pagination and x-filter — give SDK generators enough semantic information to emit typed cursor iterators, validated filter builders, and projection-aware select helpers. Without them, generators emit raw string parameters and leave validation to the caller.

What is the boundary between query shaping and business logic?

Query patterns handle data retrieval constraints and response formatting: pagination strategy, filter operator allowlists, sort field restrictions, and field projection. Business rules (authorization scopes, ownership checks, state machine guards) and state mutations must remain in dedicated service layers — never embedded in query middleware.

When should sparse fieldsets be enforced at the API gateway rather than the application?

Enforce projection at the gateway when downstream services lack their own projection capability, when strict PII boundaries require field allowlisting before routing (e.g., a shared microservice that holds mixed-sensitivity data), or when payload size contracts are part of an SLA. Application-layer enforcement is sufficient for homogeneous service topologies.

How do cursor-based pagination contracts differ from offset models in CI/CD pipelines?

Cursor contracts require stable, indexed composite sort keys and HMAC-signed opaque tokens. CI must verify that: (a) the default sort key includes a unique tiebreaker, (b) the cursor format is byte in the OpenAPI spec, and © any cursor serialization format change is paired with a major version bump — because stored tokens from the previous format become invalid for all active client sessions.

When should a POST /query endpoint replace GET with query parameters?

Switch to POST /query when filter payloads routinely exceed 2 KB (the safe limit for URL query strings across proxies and load balancers), when complex boolean expressions require a structured JSON body for readability, or when the filter grammar is too rich for a single query string parameter. The trade-off is losing HTTP caching on GET — mitigate with a Vary header strategy or a short-TTL cache keyed on the request body hash.

Offset vs Cursor Pagination — decision matrix, keyset implementation, and PostgreSQL examples
Advanced Filtering Operators — operator allowlists, depth limits, and boolean expression handling
Sorting & Multi-Field Ordering — composite sort keys, tiebreaker enforcement, and index alignment
Sparse Fieldsets & Projection — field allowlists, ORM push-down, and PII boundary enforcement
Error Contracts & Resilience Mapping — how to surface filter validation errors and cursor decode failures as structured Problem Details
API Design Fundamentals & Architecture — resource modeling, idempotency, and stateless design patterns that underpin query contract decisions