Implementing cursor-based pagination with PostgreSQL

High-throughput API endpoints degrade predictably under concurrent load when pagination relies on row offsets. This guide provides a diagnostic-to-implementation workflow for backend engineers and platform teams migrating to keyset (cursor) pagination in PostgreSQL. It covers query degradation triage, index alignment enforcement, deterministic cursor encoding, and CI/CD contract validation to eliminate duplicate records, skipped pages, and timeout regressions.

Symptom Diagnosis & Query Degradation

Offset-based pagination (LIMIT x OFFSET y) forces PostgreSQL to materialize and discard y rows on every request. Under high concurrency, this manifests as:

Diagnostic Steps

  1. Capture baseline execution plans:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) 
SELECT * FROM events ORDER BY created_at DESC LIMIT 50 OFFSET 10000;
  1. Audit CI query timeout thresholds. If statement_timeout > 500ms triggers on paginated routes, initiate migration to Offset vs Cursor Pagination for high-concurrency workloads.
  2. Validate OpenAPI page_info object responses against schema. Missing has_next_page or malformed next_cursor strings indicate incomplete contract enforcement.

CI Requirement: Fail builds if pg_stat_statements reports mean execution time > 200ms for paginated endpoints, or if EXPLAIN shows Seq Scan on tables > 50k rows.

Root Cause Analysis: Index Alignment & Sort Stability

Cursor pagination fails silently when sort columns lack deterministic tiebreakers or composite indexes. PostgreSQL requires strict total ordering to guarantee cursor continuity.

Index Scan Failures

Missing composite indexes force sequential scans. Verify index utilization:

SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'events' AND indexrelname LIKE '%created_at%';

If idx_scan = 0 on a high-traffic table, the planner is ignoring the index due to type mismatches, collation differences, or missing primary key tiebreakers.

Non-Deterministic Ordering

Sorting solely on created_at (or any non-unique column) produces unstable result sets when multiple rows share identical timestamps. This breaks cursor continuity and causes duplicate/skipped records.

Architectural Enforcement: All paginated endpoints must declare a deterministic sort chain: (sort_field, primary_key). This aligns with broader architectural decisions documented in Query Patterns & Data Shaping Strategies.

CI Requirement: Integrate a schema migration linter that rejects CREATE INDEX statements lacking a UNIQUE or PRIMARY KEY column as the final sort key.

Resolution Workflow: Composite Indexes & Keyset Pagination

1. Composite Index DDL

-- Forward pagination (newest first)
CREATE INDEX idx_events_created_at_id_desc ON events (created_at DESC, id DESC);

-- Backward pagination (oldest first)
CREATE INDEX idx_events_created_at_id_asc ON events (created_at ASC, id ASC);

2. Keyset Query Pattern

Replace OFFSET with row-value comparison. PostgreSQL natively supports tuple comparison:

-- Forward (next page)
SELECT * FROM events 
WHERE (created_at, id) < (:cursor_created_at, :cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 50;

-- Backward (previous page)
SELECT * FROM events 
WHERE (created_at, id) > (:cursor_created_at, :cursor_id)
ORDER BY created_at ASC, id ASC
LIMIT 50;

3. Cursor Encoding & Boundary Handling

Cursors must be opaque, versioned, and URL-safe. Encode as Base64(JSON):

// Cursor payload structure
interface CursorPayload {
 v: 1;
 sort: 'created_at';
 id: string;
 created_at: string; // ISO 8601
}

CI Requirement: Unit tests must verify cursor serialization/deserialization round-trips and validate that WHERE (a, b) < (x, y) matches PostgreSQL tuple comparison semantics.

Client Generation & Contract Enforcement

Automate SDK generation to enforce strict cursor parsing and error handling.

OpenAPI 3.1 Contract

components:
 schemas:
 PageInfo:
 type: object
 required: [has_next_page]
 properties:
 next_cursor:
 type: string
 nullable: true
 description: Opaque Base64-encoded JSON tuple. Omitted or null on final page.
 has_next_page:
 type: boolean
 sort_fields:
 type: string
 enum: [created_at, updated_at]

Mismatch Case & Validation

If the spec defines next_cursor as integer but PostgreSQL returns Base64, generated parsers fail. Enforce contract validation via JSON Schema tests:

{
 "if": { "properties": { "has_next_page": { "const": false } } },
 "then": { "properties": { "next_cursor": { "const": null } } }
}

Client Implementations

TypeScript SDK (Zod + Encoder):

import { z } from 'zod';

const CursorSchema = z.object({ v: z.literal(1), id: z.string(), created_at: z.string() });

export function decodeCursor(raw: string) {
 return CursorSchema.parse(JSON.parse(Buffer.from(raw, 'base64').toString()));
}

export function encodeCursor(payload: z.infer<typeof CursorSchema>) {
 return Buffer.from(JSON.stringify(payload)).toString('base64');
}

Python Async Iterator (Retry on Mutation Conflict):

async def paginate_events(client, cursor=None, limit=50):
 while True:
 try:
 resp = await client.get("/events", params={"cursor": cursor, "limit": limit})
 resp.raise_for_status()
 yield resp.json()["data"]
 if not resp.json()["page_info"]["has_next_page"]:
 break
 cursor = resp.json()["page_info"]["next_cursor"]
 except HTTPStatusError as e:
 if e.response.status_code == 409:
 await asyncio.sleep(0.5 * (2 ** attempt)) # Exponential backoff
 continue
 raise

cURL/HTTPie Template:

curl -G "https://api.example.com/v1/events" \
 --data-urlencode "cursor=eyJpZCI6IjEyMyIsImNyZWF0ZWRfYXQiOiIyMDIzLTEwLTAxVDAwOjAwOjAwWiIsInYiOjF9" \
 --data-urlencode "limit=50"

CI/CD Guardrails & Regression Testing

Prevent pagination regressions by embedding query plan validation and synthetic dataset mutation tests into deployment pipelines.

GitHub Actions Pipeline Snippet

jobs:
  pagination-regression:
    runs-on: ubuntu-latest
    services:
  postgres:
    image: postgres:15
    env: { POSTGRES_DB: test, POSTGRES_PASSWORD: postgres }
    steps:
      - uses: actions/checkout@v4
      - name: Seed Synthetic Dataset
        run: |
          psql -h localhost -U postgres -d test -c "INSERT INTO events (id, created_at)
          SELECT generate_series(1, 500000), NOW() - (random() * interval '365 days');"
      - name: Query Plan Diff
        run: |
          EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM events WHERE (created_at, id) < (NOW(), '500000') ORDER BY created_at DESC, id DESC LIMIT 50;
          # Compare against baseline plan stored in .ci/pagination-baseline.json
          # Fail if Seq Scan appears or cost increases > 20%
      - name: Cursor Continuity Validation
        run: |
          # Fetch pages 1-5, verify no duplicate IDs, verify monotonic sort
          python tests/pagination_continuity.py

CI Requirement: Block merges if synthetic dataset tests detect duplicate IDs across page boundaries, or if EXPLAIN output deviates from the stored baseline plan without explicit pg_hint_plan overrides.

Common Pitfalls & Mitigation

Pitfall Symptom Mitigation
Missing composite index on sort + PK Full table scans, statement_timeout errors, CI failures Enforce (sort_col, id) composite index via schema linter
Non-deterministic ORDER BY Duplicate/skipped records across pages Always append primary key to ORDER BY and index
Cursor payload mismatch SDK parser crashes, 400 Bad Request Enforce OpenAPI string type + Base64 encoding contract
Unversioned sort field changes Existing cursors fail with 400/410 Version cursors (v: 1), reject unknown versions, deprecate gracefully
NULL values in sort columns Inconsistent comparison logic, cursor drift Use COALESCE(sort_col, '9999-12-31T23:59:59Z') or enforce NOT NULL constraint

FAQ

How do I handle NULL values in cursor columns without breaking pagination continuity?

PostgreSQL sorts NULL values first by default (NULLS FIRST). This breaks cursor comparison when NULL appears mid-stream. Enforce NOT NULL at the schema level, or normalize NULL to a sentinel value (e.g., max timestamp for descending sorts) using COALESCE in both the index and query. Ensure the sentinel is consistent across forward/backward navigation.

What is the recommended composite index structure for multi-field cursor pagination in PostgreSQL?

Use (sort_field_1, sort_field_2, ..., primary_key) with matching ASC/DESC directions. The primary key must be the final column to guarantee uniqueness. Example: CREATE INDEX idx_orders_status_created_id ON orders (status, created_at DESC, id DESC);. Query with WHERE (status, created_at, id) < (:status, :created_at, :id).

How can I enforce cursor stability when underlying data is frequently updated or deleted?

Cursor stability relies on the sort key remaining immutable after insertion. If created_at or status is mutable, pagination will drift. Use append-only timestamp columns for pagination, or implement soft deletes with a deleted_at filter. For high-churn datasets, consider snapshotting cursors at request time or using logical replication slots for consistent reads.

Should cursor values be opaque or transparent for API consumers and debugging?

Always use opaque cursors (Base64-encoded JSON) in production APIs. Transparency breaks forward compatibility when internal sort fields change. Provide a dedicated debug endpoint (GET /debug/cursor?cursor=...) that decodes and validates payloads for internal use. This prevents clients from parsing internal schema details and reduces breaking change surface area.