Observability for APIs
Logging requests/responses
- Log method, path, status, latency, tenant/user id (hashed), request id; redact bodies by default.
- Use structured JSON logs for easy queries in ELK/Datadog/CloudWatch.
- Sample high-cardinality paths if volume explodes, but keep errors at 100%.
Metrics (latency, error rate)
- Track RED (rate, errors, duration) per route and dependency; alert on SLO burn.
- Break out 4xx vs 5xx so you do not mask client bugs as server incidents.
- Export histograms (p50/p95/p99) rather than averages only.
Tracing distributed requests
- Propagate W3C Trace Context (
traceparent) or vendor headers across services.
- Spans should cover external HTTP, DB, queues to find real bottlenecks.
- Link traces to logs via shared trace/span ids.
Correlation IDs
- Accept
X-Request-Id from clients or generate one; echo it in responses and error bodies.
- Pass the same id through async jobs and webhooks for end-to-end stories.
- Guard against header injection by validating format and length.