Skip to content

Graceful Shutdown

Status: ✅ IMPLEMENTED Priority: 🔴 HIGH (Completed) Actual Time: 6-8 hours Implementation Date: 2026-01-06 Risk Level: MEDIUM Impact: Clean resource cleanup and proper container orchestration support

← Back to Enhancements


Implementation Summary

The graceful shutdown feature has been successfully implemented with the following components:

What Was Implemented

  1. Signal Handling - Both SIGTERM (Docker/Kubernetes) and SIGINT (Ctrl+C) are properly handled in src/index.ts
  2. HTTP Server Closure - Server stops accepting new connections via server.close() with 5-second grace period
  3. MCP Session Cleanup - All active MCP sessions are closed in parallel with proper error handling
  4. Redis Connection Management - Redis connections are properly closed via closeRedisConnection()
  5. JWKS Service Cleanup - Background refresh timer is stopped via jwksService.stop()
  6. Health Check Integration - Health endpoint returns 503 during shutdown via setShuttingDown() in src/routes/health.ts
  7. Re-entrance Prevention - Shutdown flag prevents multiple shutdown triggers

Actual Implementation

The implementation in src/index.ts includes:

  • Graceful shutdown handler for SIGTERM and SIGINT signals
  • Sequential cleanup phases with timeouts
  • Proper error handling and logging
  • Health check coordination during shutdown

Key Files Modified

Testing

  • Manual testing with SIGTERM and SIGINT signals
  • Integration with Docker and Kubernetes
  • Proper cleanup verified via logs

Original Problem Statement

The server currently handles SIGINT (Ctrl+C) but performs an immediate exit without cleanup. This causes issues in production environments:

Current Implementation:

typescript
process.on("SIGINT", () => {
  logger.info("Shutting down server...");
  process.exit(0);  // Immediate exit!
});

Problems:

  1. No SIGTERM handling - Docker and Kubernetes send SIGTERM
  2. Active requests aborted - In-flight HTTP requests terminated mid-processing
  3. MCP sessions dropped - Clients receive sudden disconnection
  4. Redis connections dangling - Connections not closed gracefully
  5. Logs lost - Pending log entries not flushed
  6. Metrics incomplete - Final metrics not published
  7. JWKS timer orphaned - Background refresh timer keeps running

Production Impact:

  • Load balancers may route traffic during shutdown
  • Kubernetes considers pod unhealthy immediately
  • Client reconnection storms
  • Resource leaks in container platforms

Current Behavior


Proposed Solution

Implement comprehensive graceful shutdown with configurable timeout and proper resource cleanup.


Implementation

1. Graceful Shutdown Service

Create src/services/graceful-shutdown.ts:

typescript
import { Server } from "http";
import { logger } from "./logger.js";
import { redisClient } from "./redis.js";
import { jwksService } from "./jwks.js";
import { removeTransport } from "../mcp/mcp.js";

interface ShutdownOptions {
  timeout: number; // Maximum time to wait (ms)
  signals: string[]; // Signals to handle
}

export class GracefulShutdown {
  private isShuttingDown = false;
  private shutdownTimeout?: NodeJS.Timeout;

  constructor(
    private server: Server,
    private options: ShutdownOptions = {
      timeout: 25000, // 25 seconds (K8s terminationGracePeriodSeconds - 5s buffer)
      signals: ['SIGTERM', 'SIGINT'],
    }
  ) {}

  /**
   * Initialize shutdown handlers
   */
  initialize(): void {
    for (const signal of this.options.signals) {
      process.on(signal, () => this.shutdown(signal));
    }

    // Handle uncaught errors during shutdown
    process.on('uncaughtException', (error) => {
      logger.error('Uncaught exception during shutdown', {
        error: error.message,
        stack: error.stack,
        category: 'shutdown',
      });

      if (this.isShuttingDown) {
        process.exit(1);
      }
    });
  }

  /**
   * Perform graceful shutdown
   */
  private async shutdown(signal: string): Promise<void> {
    if (this.isShuttingDown) {
      logger.warn('Shutdown already in progress, ignoring signal', { signal });
      return;
    }

    this.isShuttingDown = true;
    const startTime = Date.now();

    logger.info('Graceful shutdown initiated', {
      signal,
      timeout: this.options.timeout,
      category: 'shutdown',
    });

    // Set hard timeout - force exit if cleanup takes too long
    this.shutdownTimeout = setTimeout(() => {
      logger.error('Shutdown timeout exceeded, forcing exit', {
        timeoutMs: this.options.timeout,
        category: 'shutdown',
      });
      process.exit(1);
    }, this.options.timeout);

    try {
      // Phase 1: Stop accepting new connections (2s max)
      await this.stopAcceptingConnections();

      // Phase 2: Wait for active requests to complete (10s max)
      await this.waitForActiveRequests(10000);

      // Phase 3: Close MCP sessions (5s max)
      await this.closeMcpSessions(5000);

      // Phase 4: Close external connections (3s max)
      await this.closeExternalConnections(3000);

      // Phase 5: Finalize (2s max)
      await this.finalize();

      const duration = Date.now() - startTime;
      logger.info('Graceful shutdown complete', {
        durationMs: duration,
        category: 'shutdown',
      });

      clearTimeout(this.shutdownTimeout);
      process.exit(0);

    } catch (error) {
      logger.error('Error during graceful shutdown', {
        error: error instanceof Error ? error.message : String(error),
        category: 'shutdown',
      });

      clearTimeout(this.shutdownTimeout);
      process.exit(1);
    }
  }

  /**
   * Phase 1: Stop accepting new connections
   */
  private async stopAcceptingConnections(): Promise<void> {
    return new Promise((resolve) => {
      this.server.close(() => {
        logger.info('HTTP server closed, no longer accepting connections', {
          category: 'shutdown',
        });
        resolve();
      });

      // Don't wait forever
      setTimeout(resolve, 2000);
    });
  }

  /**
   * Phase 2: Wait for active HTTP requests to complete
   */
  private async waitForActiveRequests(timeout: number): Promise<void> {
    const start = Date.now();
    const checkInterval = 100; // Check every 100ms

    while (Date.now() - start < timeout) {
      const connections = this.getActiveConnections();

      if (connections === 0) {
        logger.info('All active requests completed', {
          category: 'shutdown',
        });
        return;
      }

      logger.debug('Waiting for active requests', {
        activeConnections: connections,
        category: 'shutdown',
      });

      await new Promise(resolve => setTimeout(resolve, checkInterval));
    }

    const remainingConnections = this.getActiveConnections();
    if (remainingConnections > 0) {
      logger.warn('Timeout waiting for requests, proceeding with shutdown', {
        activeConnections: remainingConnections,
        category: 'shutdown',
      });
    }
  }

  /**
   * Phase 3: Close all MCP sessions
   */
  private async closeMcpSessions(timeout: number): Promise<void> {
    const start = Date.now();
    const sessionIds = Object.keys(global.transports || {});

    if (sessionIds.length === 0) {
      logger.info('No active MCP sessions to close', {
        category: 'shutdown',
      });
      return;
    }

    logger.info('Closing MCP sessions', {
      sessionCount: sessionIds.length,
      category: 'shutdown',
    });

    // Close sessions in parallel with timeout
    const closePromises = sessionIds.map(async (sessionId) => {
      try {
        await Promise.race([
          removeTransport(sessionId),
          new Promise((_, reject) =>
            setTimeout(() => reject(new Error('Session close timeout')), 1000)
          ),
        ]);
      } catch (error) {
        logger.error('Failed to close MCP session', {
          sessionId,
          error: error instanceof Error ? error.message : String(error),
          category: 'shutdown',
        });
      }
    });

    try {
      await Promise.race([
        Promise.all(closePromises),
        new Promise(resolve => setTimeout(resolve, timeout)),
      ]);

      const remaining = Object.keys(global.transports || {}).length;
      if (remaining > 0) {
        logger.warn('Some MCP sessions could not be closed gracefully', {
          remainingSessions: remaining,
          category: 'shutdown',
        });
      } else {
        logger.info('All MCP sessions closed', {
          category: 'shutdown',
        });
      }
    } catch (error) {
      logger.error('Error closing MCP sessions', {
        error: error instanceof Error ? error.message : String(error),
        category: 'shutdown',
      });
    }
  }

  /**
   * Phase 4: Close external connections
   */
  private async closeExternalConnections(timeout: number): Promise<void> {
    const closeRedis = async () => {
      try {
        await redisClient.quit();
        logger.info('Redis connection closed', {
          category: 'shutdown',
        });
      } catch (error) {
        logger.error('Failed to close Redis connection', {
          error: error instanceof Error ? error.message : String(error),
          category: 'shutdown',
        });
      }
    };

    const stopJwksRefresh = () => {
      try {
        jwksService.stop();
        logger.info('JWKS refresh timer stopped', {
          category: 'shutdown',
        });
      } catch (error) {
        logger.error('Failed to stop JWKS refresh', {
          error: error instanceof Error ? error.message : String(error),
          category: 'shutdown',
        });
      }
    };

    await Promise.race([
      Promise.all([
        closeRedis(),
        Promise.resolve(stopJwksRefresh()),
      ]),
      new Promise(resolve => setTimeout(resolve, timeout)),
    ]);
  }

  /**
   * Phase 5: Finalize and flush logs
   */
  private async finalize(): Promise<void> {
    // Flush Winston logs
    await new Promise<void>((resolve) => {
      logger.on('finish', resolve);
      logger.end();
      setTimeout(resolve, 1000); // Don't wait forever
    });

    logger.info('Log flush complete', {
      category: 'shutdown',
    });
  }

  /**
   * Get count of active HTTP connections
   */
  private getActiveConnections(): number {
    // Node.js doesn't expose this easily, need to track manually
    // For now, return 0 after server.close() is called
    return 0;
  }

  /**
   * Check if currently shutting down
   */
  isShutdownInProgress(): boolean {
    return this.isShuttingDown;
  }
}

2. Update JWKS Service

Add stop() method to src/services/jwks.ts:

typescript
export class JWKSService {
  private refreshTimer?: NodeJS.Timeout;

  // ... existing methods

  /**
   * Stop background refresh timer
   */
  stop(): void {
    if (this.refreshTimer) {
      clearTimeout(this.refreshTimer);
      this.refreshTimer = undefined;
      logger.info('JWKS refresh timer stopped', {
        category: 'jwks',
      });
    }
  }
}

3. Update Health Check

Fail health check during shutdown in src/routes/health.ts:

typescript
import { gracefulShutdown } from '../services/graceful-shutdown.js';

healthRouter.get("/", (_req, res) => {
  if (gracefulShutdown.isShutdownInProgress()) {
    return res.status(503).json({
      status: "shutting_down",
      version: config.server.version,
    });
  }

  res.json({
    status: "ok",
    version: config.server.version,
  });
});

4. Initialize in Main Application

Update src/index.ts:

typescript
import { app } from "./app.js";
import { config } from "./config/index.js";
import { logger } from "./services/logger.js";
import { GracefulShutdown } from "./services/graceful-shutdown.js";

const server = app.listen(config.port, () => {
  logger.info(`Seed MCP server running on http://localhost:${String(config.port)}/mcp`);

  if (!config.authRequired) {
    logger.warn(
      "⚠️  SECURITY WARNING: Authentication is DISABLED (AUTH_REQUIRED=false). " +
        "This should only be used in development/testing environments.",
    );
  } else {
    logger.info(`Authentication enabled, OIDC issuer: ${config.oidc.issuer || "(not configured)"}`);
  }
});

// Initialize graceful shutdown
const gracefulShutdown = new GracefulShutdown(server, {
  timeout: parseInt(process.env.SHUTDOWN_TIMEOUT || "25000", 10),
  signals: ['SIGTERM', 'SIGINT'],
});

gracefulShutdown.initialize();

// Export for health check
export { gracefulShutdown };

Kubernetes Integration

Pod Spec with Termination Grace Period

yaml
apiVersion: v1
kind: Pod
metadata:
  name: seed-mcp-server
spec:
  containers:
  - name: seed
    image: seed-mcp-server:latest
    ports:
    - containerPort: 3000
    env:
    - name: SHUTDOWN_TIMEOUT
      value: "25000"  # 25 seconds

  # Kubernetes default is 30 seconds
  # Set to 30s to give app 25s + 5s buffer
  terminationGracePeriodSeconds: 30

  # Liveness probe - check if app is alive
  livenessProbe:
    httpGet:
      path: /health
      port: 3000
    initialDelaySeconds: 10
    periodSeconds: 10
    failureThreshold: 3

  # Readiness probe - check if app can serve traffic
  readinessProbe:
    httpGet:
      path: /health/ready
      port: 3000
    initialDelaySeconds: 5
    periodSeconds: 5
    failureThreshold: 3

Shutdown Timeline


Configuration

Environment Variables

bash
# Graceful shutdown timeout (default: 25000ms / 25s)
SHUTDOWN_TIMEOUT=25000

# Kubernetes termination grace period should be higher
# Set in pod spec: terminationGracePeriodSeconds: 30

Best Practice: SHUTDOWN_TIMEOUT should be 5 seconds less than terminationGracePeriodSeconds to allow buffer for cleanup.


Testing

Manual Testing

bash
# Test SIGTERM handling
kill -TERM <pid>

# Test SIGINT handling (Ctrl+C)
npm start
# Press Ctrl+C

# Check logs for shutdown sequence
tail -f logs/app.log | grep shutdown

Unit Tests

typescript
describe('Graceful Shutdown', () => {
  it('should close all MCP sessions on shutdown', async () => {
    // Create test sessions
    const session1 = await createTestSession();
    const session2 = await createTestSession();

    // Trigger shutdown
    process.emit('SIGTERM');

    // Wait for shutdown
    await new Promise(resolve => setTimeout(resolve, 1000));

    // Assert sessions closed
    expect(await getTransport(session1)).toBeUndefined();
    expect(await getTransport(session2)).toBeUndefined();
  });

  it('should close Redis connection on shutdown', async () => {
    // Trigger shutdown
    process.emit('SIGTERM');

    // Wait for shutdown
    await new Promise(resolve => setTimeout(resolve, 3000));

    // Assert Redis closed
    expect(redisClient.status).toBe('end');
  });
});

Metrics

typescript
// Add to src/services/metrics.ts
export const shutdownDuration = new Histogram({
  name: 'shutdown_duration_seconds',
  help: 'Time taken for graceful shutdown',
  labelNames: ['phase'], // stop_connections, close_sessions, etc.
  buckets: [0.1, 0.5, 1, 2, 5, 10, 20],
  registers: [register],
});

export const shutdownSessionsClosed = new Counter({
  name: 'shutdown_sessions_closed_total',
  help: 'Number of sessions closed during shutdown',
  registers: [register],
});

Acceptance Criteria

  • [ ] Handles SIGTERM signal (Docker/Kubernetes)
  • [ ] Handles SIGINT signal (Ctrl+C)
  • [ ] Stops accepting new HTTP connections
  • [ ] Waits for active requests to complete (with timeout)
  • [ ] Closes all MCP sessions gracefully
  • [ ] Closes Redis connections properly
  • [ ] Stops JWKS background refresh timer
  • [ ] Flushes pending logs
  • [ ] Respects configurable shutdown timeout
  • [ ] Health check returns 503 during shutdown
  • [ ] Comprehensive logging of shutdown phases
  • [ ] Metrics for shutdown duration
  • [ ] Unit tests with >90% coverage
  • [ ] Integration tests with real dependencies
  • [ ] Kubernetes pod spec example in docs

Docker Considerations

Dockerfile Best Practices

dockerfile
# Use proper init system to handle signals
# Option 1: Use tini
FROM node:20-alpine
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]

# Option 2: Use dumb-init
FROM node:20-alpine
RUN apk add --no-cache dumb-init
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["node", "dist/index.js"]

# Option 3: Use --init flag
# docker run --init seed-mcp-server

Docker Compose

yaml
version: '3.8'
services:
  seed:
    image: seed-mcp-server:latest
    init: true  # Use Docker's built-in init
    stop_grace_period: 30s  # Wait 30s before SIGKILL
    environment:
      - SHUTDOWN_TIMEOUT=25000
    depends_on:
      - redis

Edge Cases

1. Shutdown During Token Refresh

Scenario: Token refresh in progress when shutdown triggered

Behavior:

  • Allow refresh to complete (up to 2s)
  • If timeout exceeded, abort refresh
  • Session cleanup removes partially refreshed tokens

2. Long-Running MCP Tool

Scenario: MCP tool taking 30 seconds to complete

Behavior:

  • Wait up to 5 seconds for tool completion
  • Force-close session after timeout
  • Tool execution aborted mid-stream
  • Client receives disconnection

Recommendation: Set tool timeout < shutdown timeout

3. Redis Connection Already Closed

Scenario: Redis connection lost before shutdown

Behavior:

  • Detect connection status
  • Skip close operation if already disconnected
  • Log as info, not error

4. Multiple Shutdown Signals

Scenario: SIGTERM followed by SIGINT

Behavior:

  • Ignore subsequent signals
  • Log "shutdown already in progress"
  • Continue with original shutdown sequence


References

Released under the MIT License.