Health Check Rate Limiting

Feature	Status	Priority	Decision Date
Health Check Rate Limiting	❌ NOT PLANNED	Low	2026-01-07

Overview

Rate limiting was intentionally NOT implemented for health check endpoints (/health and /health/ready) due to operational requirements that prioritize high availability for monitoring infrastructure over the minimal security benefits.

Decision Rationale

Operational Requirements

High Availability Required
- Kubernetes liveness and readiness probes make frequent health check requests
- Monitoring systems (Prometheus, Datadog, etc.) poll health endpoints every few seconds
- Rate limiting could cause false positives triggering alerts and auto-scaling events
- False negatives could lead to unnecessary pod restarts or service degradation
No Sensitive Data Exposure
- Health responses contain minimal information (status, version, dependency states)
- No user data, tokens, or sensitive configuration exposed
- Information disclosed is already available through other means
Low Security Risk
- Timing attacks on health endpoints have negligible impact
- Response times don't reveal sensitive operational details
- Endpoint cannot be abused for data exfiltration or privilege escalation

Alternative Mitigations

More appropriate protections for health endpoint abuse:

Network-Level Rate Limiting
- Firewall rules limiting requests per IP
- DDoS protection at CDN/load balancer layer
- More effective for preventing abuse without operational impact
Monitoring and Alerting
- Track health endpoint request patterns via existing metrics
- Alert on anomalous traffic patterns
- Respond to actual attacks rather than preventing legitimate monitoring
Access Control
- Restrict health endpoint access to monitoring systems via network policies
- Use separate internal health endpoints for Kubernetes vs. external monitoring
- Implement IP allowlists at infrastructure layer if needed

Technical Context

Current Implementation

Health check endpoints in src/routes/health.ts:

typescript

// Liveness probe - returns 503 during shutdown
healthRouter.get("/", (_req, res) => {
  if (isShuttingDown) {
    res.status(503).json({
      status: "shutting_down",
      version: config.server.version,
    });
    return;
  }

  res.json({
    status: "ok",
    version: config.server.version,
  });
});

// Readiness probe - validates dependencies
healthRouter.get("/ready", async (_req, res) => {
  // Checks Redis connectivity, JWKS cache, session capacity
});

Why Implementation Was Attempted

The gap was identified in Gap Analysis section 8.2 as a potential timing attack vector. Initial implementation added distributed rate limiting via Redis-backed middleware:

typescript

// ❌ This approach was rejected
if (config.rateLimit.enabled) {
  healthRouter.use(
    createDistributedRateLimiter({
      windowMs: config.rateLimit.health.windowMs,
      maxRequests: config.rateLimit.health.maxRequests,
      globalMax: config.rateLimit.health.globalMax,
      keyPrefix: "ratelimit:health:",
      endpointType: "health",
    }),
  );
}

Problems Discovered During Implementation

Test Infrastructure Breakage
- Existing tests in src/app.test.ts and src/config/helmet.test.ts broke when rate limiting added
- Required extensive mocking of Redis operations
- Integration tests expect health endpoint to always be available
Circular Dependency
- Rate limiting requires Redis
- Health check monitors Redis availability
- Rate limiting health check creates circular dependency where Redis failure prevents health status reporting
Operational Impact
- Kubernetes makes health check requests every few seconds
- Rate limits could cause legitimate health checks to fail
- False positives trigger unnecessary pod restarts

Monitoring Recommendations

If health endpoint abuse becomes a concern:

Monitor Health Endpoint Metrics

typescript

// Track requests per IP via existing logging
logger.info("Health check request", {
  ip: req.ip,
  userAgent: req.get("user-agent"),
  endpoint: req.path,
});

Alert on Anomalous Patterns
- Sudden spike in health check requests from single IP
- Requests from unexpected user agents
- Geographic anomalies (requests from unexpected regions)
Network-Level Protection
- Implement rate limiting at load balancer/CDN layer
- Use DDoS protection services (Cloudflare, AWS Shield)
- Restrict health endpoint access via network policies

Health Check Improvements - Enhanced health endpoints with dependency validation
Graceful Shutdown - Health endpoint returns 503 during shutdown
Redis Connection Failure - Circuit breaker pattern that health checks monitor
OAuth Authorize Rate Limiting - Example of endpoint that DOES need rate limiting

References

Health Router: src/routes/health.ts
Rate Limiting Config: src/config/rate-limit.ts
Production Deployment: Production Deployment Guide

← Back to Enhancements Overview

Health Check Rate Limiting ​

Overview ​

Decision Rationale ​

Operational Requirements ​

Alternative Mitigations ​

Technical Context ​

Current Implementation ​

Why Implementation Was Attempted ​

Problems Discovered During Implementation ​

Monitoring Recommendations ​

Related Features ​

References ​

Health Check Rate Limiting

Overview

Decision Rationale

Operational Requirements

Alternative Mitigations

Technical Context

Current Implementation

Why Implementation Was Attempted

Problems Discovered During Implementation

Monitoring Recommendations

Related Features

References