API Design for Scale, Governance, and Distributed Systems

Most API design advice focuses on getting started: pick REST, version your URLs, return proper status codes. That works fine until you’re handling 100,000 requests per second across five continents, managing PCI-DSS compliance for payment data, or coordinating API calls across 50 microservices where a single user action triggers a cascade of internal requests.

This is where the simple patterns break down. Not because they’re wrong, but because they weren’t designed for these constraints. When latency matters in milliseconds, when a service failure can’t take down your entire system, when regulatory requirements dictate your architecture - you need different patterns.

This guide covers the API design decisions that matter at scale, under compliance requirements, and in distributed systems. These aren’t theoretical best practices. They’re the specific patterns that prevent 3am incidents when you’re operating at enterprise scale.

When Simple REST Isn’t Enough

You’ll know you’ve outgrown basic REST patterns when you hit one of these thresholds:

Request volume exceeds 10,000 req/sec - At this scale, every millisecond of latency costs real money. The difference between a 50ms and 200ms response time is $50K/month in infrastructure. You start caring about connection pooling, HTTP/2 multiplexing, and whether your load balancer is introducing 2ms of latency.

Geographic distribution requires <100ms response times - A user in Singapore hitting a US-East API waits 250ms minimum for light to travel. You can’t fix physics, but you can cache at the edge, replicate data regionally, and route requests intelligently.

Compliance mandates data residency - GDPR requires EU customer data stays in the EU. HIPAA has strict audit requirements. PCI-DSS limits who can touch payment data. These aren’t suggestions - they’re legal requirements that dictate your API architecture.

One user action triggers 20+ internal API calls - The distributed monolith antipattern. Your microservices are making synchronous REST calls to each other, turning a simple “create order” into a cascading waterfall of requests. When service D is slow, everything backs up.

You’re paying $50K/month for API gateway costs - At some point, the convenience of a managed API gateway gets expensive. Netflix built their own (Zuul) because AWS API Gateway would have cost millions at their scale.

The Real Cost of Scale

Let’s talk specific numbers. Stripe processes billions of API requests annually. At peak, they handle 10,000+ req/sec. A 1ms improvement in p99 latency saves them real money in infrastructure costs.

Discord handles 5 million concurrent WebSocket connections. Their API gateway has to route messages, enforce rate limits, and maintain connection state across multiple regions without message loss or duplication.

GitHub’s GraphQL API reduced mobile app requests from 40+ REST calls per screen to 1-2 GraphQL queries. That didn’t just improve user experience - it reduced their server costs by millions annually.

The pattern: simple approaches work until the numbers force you to optimize. Don’t prematurely adopt these patterns, but recognize the thresholds where you’ll need them.

GraphQL and gRPC: Choosing the Right Protocol

REST works fine for most APIs. But when you’re dealing with complex client requirements, high request volumes, or internal service communication, GraphQL and gRPC solve specific problems that REST doesn’t address well.

When GraphQL Actually Helps

GraphQL shines when you have multiple client types with different data needs. Your web app needs user name and email. Your mobile app needs that plus profile picture and last login. Your admin dashboard needs everything plus audit logs and IP addresses.

With REST, you have three options:

Create separate endpoints per client - Maintenance nightmare as clients diverge
Return everything and let clients filter - Wastes bandwidth, especially on mobile
Add query parameters for field selection - Reinventing GraphQL poorly

Real example: Shopify’s mobile app was making 40+ REST API calls to load a single product page. Different endpoints for product details, inventory, pricing, reviews, recommendations. Each call added latency. Mobile networks are unreliable - more requests means more failure points.

They switched to GraphQL. One query returns exactly the data needed:

query ProductPage($id: ID!) {
  product(id: $id) {
    title
    price { amount currency }
    inventory { available warehouse }
    images(first: 5) { url alt }
    reviews(first: 10) {
      rating
      author { name verified }
      text
    }
  }
}

Result: 40 requests became 1. Mobile app latency dropped 60%. Server load decreased because they weren’t over-fetching data.

The GraphQL N+1 Problem

GraphQL solves one problem and creates another. Consider this query:

query {
  orders(first: 100) {
    id
    customer { name }
    items { product { title price } }
  }
}

Naive implementation: fetch 100 orders (1 query), then for each order fetch customer (100 queries), then for each item fetch product (500+ queries). You just turned 1 REST call into 600+ database queries.

The solution is DataLoader - batches and caches requests within a single query execution:

const customerLoader = new DataLoader(async (customerIds) => {
  // Batches all customer fetches into single query
  return db.customers.findMany({ id: { in: customerIds } });
});

// In resolver
customer: (order) => customerLoader.load(order.customerId)

Now those 100 customer fetches become 1 batched query. Same for products.

Facebook’s approach: They built DataLoader specifically for this. It’s not magic - it’s request-scoped batching. You still need to think about your data access patterns.

When gRPC Makes Sense

gRPC is for internal service-to-service communication where you control both ends and need performance.

Use gRPC when:

You have high request volumes between services (1000+ req/sec)
Latency matters (gRPC is typically 5-10x faster than REST+JSON)
You want type safety across service boundaries
You need bidirectional streaming

Don’t use gRPC when:

Building a public API (tooling requirement is too high for external developers)
You need browser support without proxies
Human readability matters for debugging

Real numbers from Uber: They switched internal service communication from REST to gRPC. Average latency dropped from 50ms to 7ms. Throughput increased 3x on the same hardware. At their scale (millions of requests per second), this saved millions in infrastructure costs.

gRPC uses Protocol Buffers - a binary format that’s smaller and faster to parse than JSON:

syntax = "proto3";

service OrderService {
  rpc CreateOrder(CreateOrderRequest) returns (Order);
  rpc GetOrder(GetOrderRequest) returns (Order);
  rpc ListOrders(ListOrdersRequest) returns (stream Order);
}

message Order {
  string id = 1;
  string customer_id = 2;
  repeated OrderItem items = 3;
  int64 total_cents = 4;
}

The server and client code generate from this schema. Type mismatches are caught at compile time, not runtime.

The Hybrid Approach

Most companies don’t choose one protocol. They use:

REST for public APIs - Accessibility wins
GraphQL for complex client needs - Mobile apps, rich web UIs
gRPC for internal services - Performance and type safety

Stripe’s architecture: Public API is REST. Internal services communicate via gRPC. Some internal dashboards use GraphQL.

The cost: you now maintain three different API patterns. You need expertise in all three. Your monitoring and debugging tools need to handle all three. This is the trade-off.

Multi-Tenancy and Data Isolation Patterns

If you’re building a B2B SaaS product, you’re building a multi-tenant system. Every architectural decision you make affects security, performance, and compliance. Get it wrong and you’re either leaking data between tenants or hemorrhaging money on infrastructure.

Three Isolation Models

Shared schema with tenant_id column:

CREATE TABLE orders (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  customer_id UUID NOT NULL,
  total_cents INT NOT NULL,
  created_at TIMESTAMP
);

CREATE INDEX idx_orders_tenant ON orders(tenant_id);

Every query must include WHERE tenant_id = ?. Miss it once and you leak data.

Pros: Cheapest infrastructure. One database, simple backups, easy schema migrations.

Cons: One bad query leaks data across tenants. Database tuning helps all tenants or none. Can’t give one tenant dedicated resources.

When to use: <100 tenants, similar usage patterns, trusted development team

Shared schema with Row Level Security (RLS):

PostgreSQL RLS enforces tenant isolation at the database level:

ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
  USING (tenant_id = current_setting('app.tenant_id')::UUID);

Now even if your application forgets WHERE tenant_id = ?, the database enforces it. A developer can’t accidentally write SELECT * FROM orders and see all tenants.

Pros: Database enforces isolation. Shared infrastructure. One bad query returns empty results, not cross-tenant data.

Cons: Every query has RLS overhead (PostgreSQL has to check policies). Debugging is harder (why is this query returning nothing?). Still can’t isolate noisy neighbors.

When to use: 100-1,000 tenants, high security requirements, some variation in usage

Separate schemas or databases per tenant:

-- Tenant 1
CREATE SCHEMA tenant_abc;
CREATE TABLE tenant_abc.orders (...);

-- Tenant 2
CREATE SCHEMA tenant_def;
CREATE TABLE tenant_def.orders (...);

Complete isolation. You can restore one tenant’s backup without affecting others. You can give enterprise customers dedicated database instances.

Pros: True isolation. Per-tenant performance tuning. Can migrate tenants independently. Regulatory compliance is clearer.

Cons: Expensive at scale (1,000 tenants = 1,000 schemas). Schema migrations are complex. Cross-tenant analytics require federation.

When to use: Enterprise customers with compliance requirements, highly variable usage patterns, willingness to pay for infrastructure

Tenant Context Propagation

However you store tenant data, every API request needs tenant context. A request starts at the API gateway, flows through 5 microservices, each making database queries. How does each service know which tenant?

JWT claims approach:

// API gateway verifies JWT and extracts tenant
const token = jwt.verify(request.headers.authorization);
const tenantId = token.claims.tenant_id;

// Forward in request headers
const response = await fetch('http://order-service/orders', {
  headers: {
    'X-Tenant-ID': tenantId,
    'Authorization': request.headers.authorization
  }
});

Every internal service reads X-Tenant-ID header and uses it for database queries.

The security risk: A malicious service could forge the X-Tenant-ID header. If service A is compromised, it can access any tenant’s data by changing the header.

Defense in depth:

// Order service validates tenant matches JWT
const requestTenantId = request.headers['X-Tenant-ID'];
const jwtTenantId = jwt.verify(request.headers.authorization).tenant_id;

if (requestTenantId !== jwtTenantId) {
  throw new Error('Tenant ID mismatch - possible security violation');
}

Now even if a service is compromised, it can only access data for the tenant in the cryptographically signed JWT.

Data Residency and Compliance

GDPR Article 44-50 restricts data transfers outside the EU. If you have EU customers, their data must stay in EU regions. Healthcare data under HIPAA can’t leave US borders without specific agreements.

This affects API design:

// API gateway routes based on customer region
async function routeRequest(customerId) {
  const customer = await db.customers.findOne(customerId);

  if (customer.region === 'EU') {
    return 'https://api-eu.example.com';
  } else if (customer.region === 'US') {
    return 'https://api-us.example.com';
  }
}

The problem: You now have duplicate infrastructure in every region. Backups, monitoring, deployments all need to be region-aware.

Auth0’s approach: Tenant data is region-pinned. When you sign up, you choose your region. All data for that tenant stays in that region forever. API requests automatically route to the correct region based on the JWT tenant claim.

API Gateway Patterns and Service Mesh Integration

When you have 5 microservices, each one handling its own authentication, rate limiting, and logging is annoying but manageable. When you have 50 services, it’s unsustainable. When you have 500 services, it’s impossible.

API gateways and service meshes move cross-cutting concerns out of application code and into infrastructure.

API Gateway Responsibilities

An API gateway sits between external clients and your internal services:

[Mobile App] ──┐
               ├──> [API Gateway] ──┬──> [Order Service]
[Web App]    ──┘                    ├──> [Payment Service]
                                     └──> [Inventory Service]

It handles:

Authentication: Verify JWT, API keys, OAuth tokens
Rate limiting: Enforce quotas per client
Routing: Map /v1/orders to internal order-service:8080
Transformation: External API v1 format to internal v2 format
Caching: Cache responses to reduce backend load
Analytics: Log request metadata for billing/monitoring

Implementation options:

Managed (AWS API Gateway, GCP Apigee):

Pros: Zero ops burden, scales automatically
Cons: Expensive at scale ($3.50 per million requests adds up), vendor lock-in, limited customization

Self-hosted (Kong, Tyk, Ambassador):

Pros: Full control, cheaper at high volume
Cons: You manage infrastructure, scaling, high availability

Netflix Zuul pattern: Netflix built Zuul because they needed custom logic managed API gateways couldn’t provide. At their scale, AWS API Gateway would cost millions monthly.

// Zuul filter for custom logic
public class TenantRoutingFilter extends ZuulFilter {
  public Object run() {
    RequestContext ctx = RequestContext.getCurrentContext();
    String tenantId = extractTenantId(ctx.getRequest());

    // Route to tenant-specific instance
    String serviceId = tenantRegistry.getServiceFor(tenantId);
    ctx.setRouteHost(new URL(serviceId));

    return null;
  }
}

They can route specific tenants to dedicated infrastructure, implement custom retry logic, and add per-service rate limiting - all without changing service code.

When to Add a Service Mesh

Service meshes (Istio, Linkerd, Consul Connect) handle east-west traffic - service-to-service communication within your infrastructure.

[Order Service] ──────> [Payment Service]
      │                      │
      ├──> [Inventory Service]
      └──> [Notification Service]

Without a service mesh, each service needs:

mTLS: Encrypt and authenticate service-to-service calls
Retries: Handle transient failures
Circuit breakers: Prevent cascading failures
Distributed tracing: Track requests across services
Metrics: Request rates, latencies, error rates

You could implement all this in a library and include it in every service. But now you’re maintaining authentication logic in 50 places. When you find a security bug, you need 50 deployments to fix it.

Service mesh approach: Deploy a sidecar proxy with each service. The proxy handles networking:

[Order Service] <──> [Envoy Proxy] <──> [Envoy Proxy] <──> [Payment Service]

The service makes requests to localhost. The sidecar intercepts, handles mTLS, retries, tracing, then forwards to the destination sidecar.

The cost: Operational complexity. You’re now debugging service mesh configuration, not just application code. Network hops add latency (typically 1-3ms per hop). Memory overhead for sidecars adds up (100 services × 50MB = 5GB just for proxies).

When it’s worth it: 20+ services, security requirements demand mTLS everywhere, debugging distributed systems is painful without tracing, you have platform engineers who can own the mesh.

Compliance and Governance Patterns

Regulatory compliance isn’t optional. GDPR fines are up to €20M or 4% of global revenue. HIPAA violations can shut down your business. PCI-DSS requirements are mandatory if you handle payment data.

These requirements dictate API architecture. You can’t “add compliance later” - it fundamentally shapes how you design APIs, store data, and log requests.

HIPAA: Healthcare Data Requirements

HIPAA’s Technical Safeguards (45 CFR §164.312) require specific API patterns:

(c)(1) Integrity: Ensure data isn’t improperly altered or destroyed

// Every API modification includes audit trail
app.patch('/patients/:id', async (req, res) => {
  const patient = await db.patients.findOne(req.params.id);

  // Log what changed and who changed it
  await auditLog.create({
    resource_type: 'patient',
    resource_id: patient.id,
    action: 'update',
    user_id: req.user.id,
    user_role: req.user.role,
    ip_address: req.ip,
    changes: {
      before: patient,
      after: req.body
    },
    timestamp: new Date()
  });

  await patient.update(req.body);
  res.json(patient);
});

This audit log is legally required. It must be tamper-proof (append-only), retained for 6 years, and available for audits.

(d) Person or Entity Authentication: Verify identity before granting access

Multi-factor authentication isn’t optional for HIPAA. Your API must enforce it:

async function authenticateRequest(req) {
  // Step 1: Verify JWT (something they have)
  const token = jwt.verify(req.headers.authorization);

  // Step 2: For sensitive operations, require recent MFA
  if (req.path.includes('/patients') || req.method !== 'GET') {
    const mfaTimestamp = token.claims.mfa_verified_at;
    const fiveMinutesAgo = Date.now() - (5 * 60 * 1000);

    if (!mfaTimestamp || mfaTimestamp < fiveMinutesAgo) {
      throw new Error('MFA required for this operation');
    }
  }

  return token;
}

Sensitive operations (viewing patient records, modifying data) require MFA within the last 5 minutes.

(e)(1) Transmission Security: Encrypt PHI in transit

TLS 1.2+ is required. But enforcement happens at the API level:

// Reject non-HTTPS requests
app.use((req, res, next) => {
  if (!req.secure && process.env.NODE_ENV === 'production') {
    return res.status(403).json({
      error: 'HTTPS required for all requests'
    });
  }
  next();
});

When a user requests deletion, you must delete all their data within 30 days. In a microservices architecture, that data is scattered across services.

Naive approach: Each service exposes a DELETE /users/:id endpoint. Your API calls each one:

// This doesn't work at scale
async function deleteUser(userId) {
  await orderService.delete(userId);
  await paymentService.delete(userId);
  await analyticsService.delete(userId);
  // What if one fails? What about services you forgot?
}

Problems:

What if payment service is down?
What about data in backup systems?
How do you verify all data is deleted?
What about data in logs?

Event-driven approach:

// API publishes deletion event
async function deleteUser(userId) {
  await eventBus.publish('user.deletion_requested', {
    user_id: userId,
    requested_at: new Date(),
    deadline: addDays(new Date(), 30)
  });
}

// Each service subscribes and handles deletion
orderService.on('user.deletion_requested', async (event) => {
  await db.orders.deleteMany({ user_id: event.user_id });
  await eventBus.publish('user.deletion_completed', {
    service: 'order-service',
    user_id: event.user_id
  });
});

Audit trail proves compliance:

// Deletion dashboard shows progress
GET /admin/gdpr-requests/123

{
  "user_id": "user_456",
  "requested_at": "2025-11-01T10:00:00Z",
  "deadline": "2025-12-01T10:00:00Z",
  "status": "in_progress",
  "services_completed": [
    "order-service",
    "payment-service"
  ],
  "services_pending": [
    "analytics-service"
  ]
}

You can prove to regulators that you deleted all data by the deadline.

PCI-DSS: Payment Data Isolation

PCI-DSS Requirement 1.3: Prohibit direct public access between the Internet and any system component in the cardholder data environment.

Your payment API must be isolated from other APIs:

[Public API Gateway] ──┬──> [Order Service]  (PCI-out-of-scope)
                       ├──> [Product Service] (PCI-out-of-scope)
                       └──> [Payment Gateway] ──> [Payment Service] (PCI-in-scope)

The payment service runs in a separate network segment. Only the payment gateway can reach it. This reduces PCI scope - most of your infrastructure is out-of-scope.

Tokenization API pattern:

// Public API never sees card data
POST /orders
{
  "items": [...],
  "payment_token": "tok_1234..."  // From Stripe, not raw card
}

// Payment service exchanges token for charge
async function processPayment(paymentToken) {
  const charge = await stripe.charges.create({
    source: paymentToken,
    amount: orderTotal
  });
}

Your API never touches raw card data. Stripe is PCI-certified and handles the sensitive data. Your PCI compliance burden drops from “entire infrastructure” to “how we integrate with Stripe.”

Related Topics:

API Design - Surface - REST fundamentals, basic versioning
API Design - Mid-Depth - Pagination, rate limiting, idempotency
Architecture Design - Deep Water - Microservices, event-driven architecture
Secure Coding Practices - Deep Water - OAuth2, JWT, encryption
Monitoring & Logging - Deep Water - Distributed tracing, observability