Graceful Shutdown
Status: ✅ IMPLEMENTED Priority: 🔴 HIGH (Completed) Actual Time: 6-8 hours Implementation Date: 2026-01-06 Risk Level: MEDIUM Impact: Clean resource cleanup and proper container orchestration support
Implementation Summary
The graceful shutdown feature has been successfully implemented with the following components:
What Was Implemented
- Signal Handling - Both SIGTERM (Docker/Kubernetes) and SIGINT (Ctrl+C) are properly handled in src/index.ts
- HTTP Server Closure - Server stops accepting new connections via
server.close()with 5-second grace period - MCP Session Cleanup - All active MCP sessions are closed in parallel with proper error handling
- Redis Connection Management - Redis connections are properly closed via
closeRedisConnection() - JWKS Service Cleanup - Background refresh timer is stopped via
jwksService.stop() - Health Check Integration - Health endpoint returns 503 during shutdown via
setShuttingDown()in src/routes/health.ts - Re-entrance Prevention - Shutdown flag prevents multiple shutdown triggers
Actual Implementation
The implementation in src/index.ts includes:
- Graceful shutdown handler for SIGTERM and SIGINT signals
- Sequential cleanup phases with timeouts
- Proper error handling and logging
- Health check coordination during shutdown
Key Files Modified
- src/index.ts - Main shutdown logic
- src/routes/health.ts - Health check during shutdown
- src/services/jwks.ts - JWKS timer cleanup
Testing
- Manual testing with SIGTERM and SIGINT signals
- Integration with Docker and Kubernetes
- Proper cleanup verified via logs
Original Problem Statement
The server currently handles SIGINT (Ctrl+C) but performs an immediate exit without cleanup. This causes issues in production environments:
Current Implementation:
process.on("SIGINT", () => {
logger.info("Shutting down server...");
process.exit(0); // Immediate exit!
});Problems:
- No SIGTERM handling - Docker and Kubernetes send SIGTERM
- Active requests aborted - In-flight HTTP requests terminated mid-processing
- MCP sessions dropped - Clients receive sudden disconnection
- Redis connections dangling - Connections not closed gracefully
- Logs lost - Pending log entries not flushed
- Metrics incomplete - Final metrics not published
- JWKS timer orphaned - Background refresh timer keeps running
Production Impact:
- Load balancers may route traffic during shutdown
- Kubernetes considers pod unhealthy immediately
- Client reconnection storms
- Resource leaks in container platforms
Current Behavior
Proposed Solution
Implement comprehensive graceful shutdown with configurable timeout and proper resource cleanup.
Implementation
1. Graceful Shutdown Service
Create src/services/graceful-shutdown.ts:
import { Server } from "http";
import { logger } from "./logger.js";
import { redisClient } from "./redis.js";
import { jwksService } from "./jwks.js";
import { removeTransport } from "../mcp/mcp.js";
interface ShutdownOptions {
timeout: number; // Maximum time to wait (ms)
signals: string[]; // Signals to handle
}
export class GracefulShutdown {
private isShuttingDown = false;
private shutdownTimeout?: NodeJS.Timeout;
constructor(
private server: Server,
private options: ShutdownOptions = {
timeout: 25000, // 25 seconds (K8s terminationGracePeriodSeconds - 5s buffer)
signals: ['SIGTERM', 'SIGINT'],
}
) {}
/**
* Initialize shutdown handlers
*/
initialize(): void {
for (const signal of this.options.signals) {
process.on(signal, () => this.shutdown(signal));
}
// Handle uncaught errors during shutdown
process.on('uncaughtException', (error) => {
logger.error('Uncaught exception during shutdown', {
error: error.message,
stack: error.stack,
category: 'shutdown',
});
if (this.isShuttingDown) {
process.exit(1);
}
});
}
/**
* Perform graceful shutdown
*/
private async shutdown(signal: string): Promise<void> {
if (this.isShuttingDown) {
logger.warn('Shutdown already in progress, ignoring signal', { signal });
return;
}
this.isShuttingDown = true;
const startTime = Date.now();
logger.info('Graceful shutdown initiated', {
signal,
timeout: this.options.timeout,
category: 'shutdown',
});
// Set hard timeout - force exit if cleanup takes too long
this.shutdownTimeout = setTimeout(() => {
logger.error('Shutdown timeout exceeded, forcing exit', {
timeoutMs: this.options.timeout,
category: 'shutdown',
});
process.exit(1);
}, this.options.timeout);
try {
// Phase 1: Stop accepting new connections (2s max)
await this.stopAcceptingConnections();
// Phase 2: Wait for active requests to complete (10s max)
await this.waitForActiveRequests(10000);
// Phase 3: Close MCP sessions (5s max)
await this.closeMcpSessions(5000);
// Phase 4: Close external connections (3s max)
await this.closeExternalConnections(3000);
// Phase 5: Finalize (2s max)
await this.finalize();
const duration = Date.now() - startTime;
logger.info('Graceful shutdown complete', {
durationMs: duration,
category: 'shutdown',
});
clearTimeout(this.shutdownTimeout);
process.exit(0);
} catch (error) {
logger.error('Error during graceful shutdown', {
error: error instanceof Error ? error.message : String(error),
category: 'shutdown',
});
clearTimeout(this.shutdownTimeout);
process.exit(1);
}
}
/**
* Phase 1: Stop accepting new connections
*/
private async stopAcceptingConnections(): Promise<void> {
return new Promise((resolve) => {
this.server.close(() => {
logger.info('HTTP server closed, no longer accepting connections', {
category: 'shutdown',
});
resolve();
});
// Don't wait forever
setTimeout(resolve, 2000);
});
}
/**
* Phase 2: Wait for active HTTP requests to complete
*/
private async waitForActiveRequests(timeout: number): Promise<void> {
const start = Date.now();
const checkInterval = 100; // Check every 100ms
while (Date.now() - start < timeout) {
const connections = this.getActiveConnections();
if (connections === 0) {
logger.info('All active requests completed', {
category: 'shutdown',
});
return;
}
logger.debug('Waiting for active requests', {
activeConnections: connections,
category: 'shutdown',
});
await new Promise(resolve => setTimeout(resolve, checkInterval));
}
const remainingConnections = this.getActiveConnections();
if (remainingConnections > 0) {
logger.warn('Timeout waiting for requests, proceeding with shutdown', {
activeConnections: remainingConnections,
category: 'shutdown',
});
}
}
/**
* Phase 3: Close all MCP sessions
*/
private async closeMcpSessions(timeout: number): Promise<void> {
const start = Date.now();
const sessionIds = Object.keys(global.transports || {});
if (sessionIds.length === 0) {
logger.info('No active MCP sessions to close', {
category: 'shutdown',
});
return;
}
logger.info('Closing MCP sessions', {
sessionCount: sessionIds.length,
category: 'shutdown',
});
// Close sessions in parallel with timeout
const closePromises = sessionIds.map(async (sessionId) => {
try {
await Promise.race([
removeTransport(sessionId),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Session close timeout')), 1000)
),
]);
} catch (error) {
logger.error('Failed to close MCP session', {
sessionId,
error: error instanceof Error ? error.message : String(error),
category: 'shutdown',
});
}
});
try {
await Promise.race([
Promise.all(closePromises),
new Promise(resolve => setTimeout(resolve, timeout)),
]);
const remaining = Object.keys(global.transports || {}).length;
if (remaining > 0) {
logger.warn('Some MCP sessions could not be closed gracefully', {
remainingSessions: remaining,
category: 'shutdown',
});
} else {
logger.info('All MCP sessions closed', {
category: 'shutdown',
});
}
} catch (error) {
logger.error('Error closing MCP sessions', {
error: error instanceof Error ? error.message : String(error),
category: 'shutdown',
});
}
}
/**
* Phase 4: Close external connections
*/
private async closeExternalConnections(timeout: number): Promise<void> {
const closeRedis = async () => {
try {
await redisClient.quit();
logger.info('Redis connection closed', {
category: 'shutdown',
});
} catch (error) {
logger.error('Failed to close Redis connection', {
error: error instanceof Error ? error.message : String(error),
category: 'shutdown',
});
}
};
const stopJwksRefresh = () => {
try {
jwksService.stop();
logger.info('JWKS refresh timer stopped', {
category: 'shutdown',
});
} catch (error) {
logger.error('Failed to stop JWKS refresh', {
error: error instanceof Error ? error.message : String(error),
category: 'shutdown',
});
}
};
await Promise.race([
Promise.all([
closeRedis(),
Promise.resolve(stopJwksRefresh()),
]),
new Promise(resolve => setTimeout(resolve, timeout)),
]);
}
/**
* Phase 5: Finalize and flush logs
*/
private async finalize(): Promise<void> {
// Flush Winston logs
await new Promise<void>((resolve) => {
logger.on('finish', resolve);
logger.end();
setTimeout(resolve, 1000); // Don't wait forever
});
logger.info('Log flush complete', {
category: 'shutdown',
});
}
/**
* Get count of active HTTP connections
*/
private getActiveConnections(): number {
// Node.js doesn't expose this easily, need to track manually
// For now, return 0 after server.close() is called
return 0;
}
/**
* Check if currently shutting down
*/
isShutdownInProgress(): boolean {
return this.isShuttingDown;
}
}2. Update JWKS Service
Add stop() method to src/services/jwks.ts:
export class JWKSService {
private refreshTimer?: NodeJS.Timeout;
// ... existing methods
/**
* Stop background refresh timer
*/
stop(): void {
if (this.refreshTimer) {
clearTimeout(this.refreshTimer);
this.refreshTimer = undefined;
logger.info('JWKS refresh timer stopped', {
category: 'jwks',
});
}
}
}3. Update Health Check
Fail health check during shutdown in src/routes/health.ts:
import { gracefulShutdown } from '../services/graceful-shutdown.js';
healthRouter.get("/", (_req, res) => {
if (gracefulShutdown.isShutdownInProgress()) {
return res.status(503).json({
status: "shutting_down",
version: config.server.version,
});
}
res.json({
status: "ok",
version: config.server.version,
});
});4. Initialize in Main Application
Update src/index.ts:
import { app } from "./app.js";
import { config } from "./config/index.js";
import { logger } from "./services/logger.js";
import { GracefulShutdown } from "./services/graceful-shutdown.js";
const server = app.listen(config.port, () => {
logger.info(`Seed MCP server running on http://localhost:${String(config.port)}/mcp`);
if (!config.authRequired) {
logger.warn(
"⚠️ SECURITY WARNING: Authentication is DISABLED (AUTH_REQUIRED=false). " +
"This should only be used in development/testing environments.",
);
} else {
logger.info(`Authentication enabled, OIDC issuer: ${config.oidc.issuer || "(not configured)"}`);
}
});
// Initialize graceful shutdown
const gracefulShutdown = new GracefulShutdown(server, {
timeout: parseInt(process.env.SHUTDOWN_TIMEOUT || "25000", 10),
signals: ['SIGTERM', 'SIGINT'],
});
gracefulShutdown.initialize();
// Export for health check
export { gracefulShutdown };Kubernetes Integration
Pod Spec with Termination Grace Period
apiVersion: v1
kind: Pod
metadata:
name: seed-mcp-server
spec:
containers:
- name: seed
image: seed-mcp-server:latest
ports:
- containerPort: 3000
env:
- name: SHUTDOWN_TIMEOUT
value: "25000" # 25 seconds
# Kubernetes default is 30 seconds
# Set to 30s to give app 25s + 5s buffer
terminationGracePeriodSeconds: 30
# Liveness probe - check if app is alive
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
# Readiness probe - check if app can serve traffic
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3Shutdown Timeline
Configuration
Environment Variables
# Graceful shutdown timeout (default: 25000ms / 25s)
SHUTDOWN_TIMEOUT=25000
# Kubernetes termination grace period should be higher
# Set in pod spec: terminationGracePeriodSeconds: 30Best Practice: SHUTDOWN_TIMEOUT should be 5 seconds less than terminationGracePeriodSeconds to allow buffer for cleanup.
Testing
Manual Testing
# Test SIGTERM handling
kill -TERM <pid>
# Test SIGINT handling (Ctrl+C)
npm start
# Press Ctrl+C
# Check logs for shutdown sequence
tail -f logs/app.log | grep shutdownUnit Tests
describe('Graceful Shutdown', () => {
it('should close all MCP sessions on shutdown', async () => {
// Create test sessions
const session1 = await createTestSession();
const session2 = await createTestSession();
// Trigger shutdown
process.emit('SIGTERM');
// Wait for shutdown
await new Promise(resolve => setTimeout(resolve, 1000));
// Assert sessions closed
expect(await getTransport(session1)).toBeUndefined();
expect(await getTransport(session2)).toBeUndefined();
});
it('should close Redis connection on shutdown', async () => {
// Trigger shutdown
process.emit('SIGTERM');
// Wait for shutdown
await new Promise(resolve => setTimeout(resolve, 3000));
// Assert Redis closed
expect(redisClient.status).toBe('end');
});
});Metrics
// Add to src/services/metrics.ts
export const shutdownDuration = new Histogram({
name: 'shutdown_duration_seconds',
help: 'Time taken for graceful shutdown',
labelNames: ['phase'], // stop_connections, close_sessions, etc.
buckets: [0.1, 0.5, 1, 2, 5, 10, 20],
registers: [register],
});
export const shutdownSessionsClosed = new Counter({
name: 'shutdown_sessions_closed_total',
help: 'Number of sessions closed during shutdown',
registers: [register],
});Acceptance Criteria
- [ ] Handles SIGTERM signal (Docker/Kubernetes)
- [ ] Handles SIGINT signal (Ctrl+C)
- [ ] Stops accepting new HTTP connections
- [ ] Waits for active requests to complete (with timeout)
- [ ] Closes all MCP sessions gracefully
- [ ] Closes Redis connections properly
- [ ] Stops JWKS background refresh timer
- [ ] Flushes pending logs
- [ ] Respects configurable shutdown timeout
- [ ] Health check returns 503 during shutdown
- [ ] Comprehensive logging of shutdown phases
- [ ] Metrics for shutdown duration
- [ ] Unit tests with >90% coverage
- [ ] Integration tests with real dependencies
- [ ] Kubernetes pod spec example in docs
Docker Considerations
Dockerfile Best Practices
# Use proper init system to handle signals
# Option 1: Use tini
FROM node:20-alpine
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]
# Option 2: Use dumb-init
FROM node:20-alpine
RUN apk add --no-cache dumb-init
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["node", "dist/index.js"]
# Option 3: Use --init flag
# docker run --init seed-mcp-serverDocker Compose
version: '3.8'
services:
seed:
image: seed-mcp-server:latest
init: true # Use Docker's built-in init
stop_grace_period: 30s # Wait 30s before SIGKILL
environment:
- SHUTDOWN_TIMEOUT=25000
depends_on:
- redisEdge Cases
1. Shutdown During Token Refresh
Scenario: Token refresh in progress when shutdown triggered
Behavior:
- Allow refresh to complete (up to 2s)
- If timeout exceeded, abort refresh
- Session cleanup removes partially refreshed tokens
2. Long-Running MCP Tool
Scenario: MCP tool taking 30 seconds to complete
Behavior:
- Wait up to 5 seconds for tool completion
- Force-close session after timeout
- Tool execution aborted mid-stream
- Client receives disconnection
Recommendation: Set tool timeout < shutdown timeout
3. Redis Connection Already Closed
Scenario: Redis connection lost before shutdown
Behavior:
- Detect connection status
- Skip close operation if already disconnected
- Log as info, not error
4. Multiple Shutdown Signals
Scenario: SIGTERM followed by SIGINT
Behavior:
- Ignore subsequent signals
- Log "shutdown already in progress"
- Continue with original shutdown sequence
Related Enhancements
- Health Check Improvements - Readiness probe support
- Session Persistence - Session recovery after restart