Health Check Rate Limiting
| Feature | Status | Priority | Decision Date |
|---|---|---|---|
| Health Check Rate Limiting | ❌ NOT PLANNED | Low | 2026-01-07 |
Overview
Rate limiting was intentionally NOT implemented for health check endpoints (/health and /health/ready) due to operational requirements that prioritize high availability for monitoring infrastructure over the minimal security benefits.
Decision Rationale
Operational Requirements
High Availability Required
- Kubernetes liveness and readiness probes make frequent health check requests
- Monitoring systems (Prometheus, Datadog, etc.) poll health endpoints every few seconds
- Rate limiting could cause false positives triggering alerts and auto-scaling events
- False negatives could lead to unnecessary pod restarts or service degradation
No Sensitive Data Exposure
- Health responses contain minimal information (status, version, dependency states)
- No user data, tokens, or sensitive configuration exposed
- Information disclosed is already available through other means
Low Security Risk
- Timing attacks on health endpoints have negligible impact
- Response times don't reveal sensitive operational details
- Endpoint cannot be abused for data exfiltration or privilege escalation
Alternative Mitigations
More appropriate protections for health endpoint abuse:
Network-Level Rate Limiting
- Firewall rules limiting requests per IP
- DDoS protection at CDN/load balancer layer
- More effective for preventing abuse without operational impact
Monitoring and Alerting
- Track health endpoint request patterns via existing metrics
- Alert on anomalous traffic patterns
- Respond to actual attacks rather than preventing legitimate monitoring
Access Control
- Restrict health endpoint access to monitoring systems via network policies
- Use separate internal health endpoints for Kubernetes vs. external monitoring
- Implement IP allowlists at infrastructure layer if needed
Technical Context
Current Implementation
Health check endpoints in src/routes/health.ts:
// Liveness probe - returns 503 during shutdown
healthRouter.get("/", (_req, res) => {
if (isShuttingDown) {
res.status(503).json({
status: "shutting_down",
version: config.server.version,
});
return;
}
res.json({
status: "ok",
version: config.server.version,
});
});
// Readiness probe - validates dependencies
healthRouter.get("/ready", async (_req, res) => {
// Checks Redis connectivity, JWKS cache, session capacity
});Why Implementation Was Attempted
The gap was identified in Gap Analysis section 8.2 as a potential timing attack vector. Initial implementation added distributed rate limiting via Redis-backed middleware:
// ❌ This approach was rejected
if (config.rateLimit.enabled) {
healthRouter.use(
createDistributedRateLimiter({
windowMs: config.rateLimit.health.windowMs,
maxRequests: config.rateLimit.health.maxRequests,
globalMax: config.rateLimit.health.globalMax,
keyPrefix: "ratelimit:health:",
endpointType: "health",
}),
);
}Problems Discovered During Implementation
Test Infrastructure Breakage
- Existing tests in src/app.test.ts and src/config/helmet.test.ts broke when rate limiting added
- Required extensive mocking of Redis operations
- Integration tests expect health endpoint to always be available
Circular Dependency
- Rate limiting requires Redis
- Health check monitors Redis availability
- Rate limiting health check creates circular dependency where Redis failure prevents health status reporting
Operational Impact
- Kubernetes makes health check requests every few seconds
- Rate limits could cause legitimate health checks to fail
- False positives trigger unnecessary pod restarts
Monitoring Recommendations
If health endpoint abuse becomes a concern:
Monitor Health Endpoint Metrics
typescript// Track requests per IP via existing logging logger.info("Health check request", { ip: req.ip, userAgent: req.get("user-agent"), endpoint: req.path, });Alert on Anomalous Patterns
- Sudden spike in health check requests from single IP
- Requests from unexpected user agents
- Geographic anomalies (requests from unexpected regions)
Network-Level Protection
- Implement rate limiting at load balancer/CDN layer
- Use DDoS protection services (Cloudflare, AWS Shield)
- Restrict health endpoint access via network policies
Related Features
- Health Check Improvements - Enhanced health endpoints with dependency validation
- Graceful Shutdown - Health endpoint returns 503 during shutdown
- Redis Connection Failure - Circuit breaker pattern that health checks monitor
- OAuth Authorize Rate Limiting - Example of endpoint that DOES need rate limiting
References
- Health Router:
src/routes/health.ts - Rate Limiting Config:
src/config/rate-limit.ts - Production Deployment: Production Deployment Guide