- Use Cache-Control, ETag, and Last-Modified for GETs that are safe to cache at CDNs or browsers.
- Mark personalized responses private or no-store as appropriate.
- Invalidate or shorten TTL when correctness beats freshness (balances, entitlements).
- ETag lets clients send
If-None-Match to skip large bodies when nothing changed (304).
- Strong vs weak ETags matter for byte-identical semantics; document which you emit.
- Combine with Range requests only when you fully understand intermediaries.
Rate limiting strategies
- Token bucket for smooth bursts; sliding window for fairness; per-tenant quotas for SaaS.
- Return structured quota headers (
X-RateLimit-*, RateLimit-* drafts) when helpful.
- Coordinate limits with API gateway and service-level budgets.
Bulk operations vs single requests
- Bulk reduces round trips but increases payload size, timeouts, and partial failure complexity.
- Prefer bounded batch sizes and per-item error reporting in the response.
- For huge jobs, use 202 + async processing instead of multi-minute HTTP requests.
Payload optimization
- Enable gzip/Brotli at the edge; trim unused fields with sparse fieldsets or GraphQL-like projections if supported.
- Avoid N+1 chatty patterns; offer includes or dedicated aggregate reads when needed.
- Watch serialization cost on hot paths (large JSON maps, deeply nested objects).