Graceful Shutdown

Status: ✅ IMPLEMENTED Priority: 🔴 HIGH (Completed) Actual Time: 6-8 hours Implementation Date: 2026-01-06 Risk Level: MEDIUM Impact: Clean resource cleanup and proper container orchestration support

← Back to Enhancements

Implementation Summary

The graceful shutdown feature has been successfully implemented with the following components:

What Was Implemented

Signal Handling - Both SIGTERM (Docker/Kubernetes) and SIGINT (Ctrl+C) are properly handled in src/index.ts
HTTP Server Closure - Server stops accepting new connections via server.close() with 5-second grace period
MCP Session Cleanup - All active MCP sessions are closed in parallel with proper error handling
Redis Connection Management - Redis connections are properly closed via closeRedisConnection()
JWKS Service Cleanup - Background refresh timer is stopped via jwksService.stop()
Health Check Integration - Health endpoint returns 503 during shutdown via setShuttingDown() in src/routes/health.ts
Re-entrance Prevention - Shutdown flag prevents multiple shutdown triggers

Actual Implementation

The implementation in src/index.ts includes:

Graceful shutdown handler for SIGTERM and SIGINT signals
Sequential cleanup phases with timeouts
Proper error handling and logging
Health check coordination during shutdown

Key Files Modified

src/index.ts - Main shutdown logic
src/routes/health.ts - Health check during shutdown
src/services/jwks.ts - JWKS timer cleanup

Testing

Manual testing with SIGTERM and SIGINT signals
Integration with Docker and Kubernetes
Proper cleanup verified via logs

Original Problem Statement

The server currently handles SIGINT (Ctrl+C) but performs an immediate exit without cleanup. This causes issues in production environments:

Current Implementation:

typescript

process.on("SIGINT", () => {
  logger.info("Shutting down server...");
  process.exit(0);  // Immediate exit!
});

Problems:

No SIGTERM handling - Docker and Kubernetes send SIGTERM
Active requests aborted - In-flight HTTP requests terminated mid-processing
MCP sessions dropped - Clients receive sudden disconnection
Redis connections dangling - Connections not closed gracefully
Logs lost - Pending log entries not flushed
Metrics incomplete - Final metrics not published
JWKS timer orphaned - Background refresh timer keeps running

Production Impact:

Load balancers may route traffic during shutdown
Kubernetes considers pod unhealthy immediately
Client reconnection storms
Resource leaks in container platforms

Current Behavior

Proposed Solution

Implement comprehensive graceful shutdown with configurable timeout and proper resource cleanup.

Implementation

1. Graceful Shutdown Service

Create src/services/graceful-shutdown.ts:

typescript

import { Server } from "http";
import { logger } from "./logger.js";
import { redisClient } from "./redis.js";
import { jwksService } from "./jwks.js";
import { removeTransport } from "../mcp/mcp.js";

interface ShutdownOptions {
  timeout: number; // Maximum time to wait (ms)
  signals: string[]; // Signals to handle
}

export class GracefulShutdown {
  private isShuttingDown = false;
  private shutdownTimeout?: NodeJS.Timeout;

  constructor(
    private server: Server,
    private options: ShutdownOptions = {
      timeout: 25000, // 25 seconds (K8s terminationGracePeriodSeconds - 5s buffer)
      signals: ['SIGTERM', 'SIGINT'],
    }
  ) {}

  /**
   * Initialize shutdown handlers
   */
  initialize(): void {
    for (const signal of this.options.signals) {
      process.on(signal, () => this.shutdown(signal));
    }

    // Handle uncaught errors during shutdown
    process.on('uncaughtException', (error) => {
      logger.error('Uncaught exception during shutdown', {
        error: error.message,
        stack: error.stack,
        category: 'shutdown',
      });

      if (this.isShuttingDown) {
        process.exit(1);
      }
    });
  }

  /**
   * Perform graceful shutdown
   */
  private async shutdown(signal: string): Promise<void> {
    if (this.isShuttingDown) {
      logger.warn('Shutdown already in progress, ignoring signal', { signal });
      return;
    }

    this.isShuttingDown = true;
    const startTime = Date.now();

    logger.info('Graceful shutdown initiated', {
      signal,
      timeout: this.options.timeout,
      category: 'shutdown',
    });

    // Set hard timeout - force exit if cleanup takes too long
    this.shutdownTimeout = setTimeout(() => {
      logger.error('Shutdown timeout exceeded, forcing exit', {
        timeoutMs: this.options.timeout,
        category: 'shutdown',
      });
      process.exit(1);
    }, this.options.timeout);

    try {
      // Phase 1: Stop accepting new connections (2s max)
      await this.stopAcceptingConnections();

      // Phase 2: Wait for active requests to complete (10s max)
      await this.waitForActiveRequests(10000);

      // Phase 3: Close MCP sessions (5s max)
      await this.closeMcpSessions(5000);

      // Phase 4: Close external connections (3s max)
      await this.closeExternalConnections(3000);

      // Phase 5: Finalize (2s max)
      await this.finalize();

      const duration = Date.now() - startTime;
      logger.info('Graceful shutdown complete', {
        durationMs: duration,
        category: 'shutdown',
      });

      clearTimeout(this.shutdownTimeout);
      process.exit(0);

    } catch (error) {
      logger.error('Error during graceful shutdown', {
        error: error instanceof Error ? error.message : String(error),
        category: 'shutdown',
      });

      clearTimeout(this.shutdownTimeout);
      process.exit(1);
    }
  }

  /**
   * Phase 1: Stop accepting new connections
   */
  private async stopAcceptingConnections(): Promise<void> {
    return new Promise((resolve) => {
      this.server.close(() => {
        logger.info('HTTP server closed, no longer accepting connections', {
          category: 'shutdown',
        });
        resolve();
      });

      // Don't wait forever
      setTimeout(resolve, 2000);
    });
  }

  /**
   * Phase 2: Wait for active HTTP requests to complete
   */
  private async waitForActiveRequests(timeout: number): Promise<void> {
    const start = Date.now();
    const checkInterval = 100; // Check every 100ms

    while (Date.now() - start < timeout) {
      const connections = this.getActiveConnections();

      if (connections === 0) {
        logger.info('All active requests completed', {
          category: 'shutdown',
        });
        return;
      }

      logger.debug('Waiting for active requests', {
        activeConnections: connections,
        category: 'shutdown',
      });

      await new Promise(resolve => setTimeout(resolve, checkInterval));
    }

    const remainingConnections = this.getActiveConnections();
    if (remainingConnections > 0) {
      logger.warn('Timeout waiting for requests, proceeding with shutdown', {
        activeConnections: remainingConnections,
        category: 'shutdown',
      });
    }
  }

  /**
   * Phase 3: Close all MCP sessions
   */
  private async closeMcpSessions(timeout: number): Promise<void> {
    const start = Date.now();
    const sessionIds = Object.keys(global.transports || {});

    if (sessionIds.length === 0) {
      logger.info('No active MCP sessions to close', {
        category: 'shutdown',
      });
      return;
    }

    logger.info('Closing MCP sessions', {
      sessionCount: sessionIds.length,
      category: 'shutdown',
    });

    // Close sessions in parallel with timeout
    const closePromises = sessionIds.map(async (sessionId) => {
      try {
        await Promise.race([
          removeTransport(sessionId),
          new Promise((_, reject) =>
            setTimeout(() => reject(new Error('Session close timeout')), 1000)
          ),
        ]);
      } catch (error) {
        logger.error('Failed to close MCP session', {
          sessionId,
          error: error instanceof Error ? error.message : String(error),
          category: 'shutdown',
        });
      }
    });

    try {
      await Promise.race([
        Promise.all(closePromises),
        new Promise(resolve => setTimeout(resolve, timeout)),
      ]);

      const remaining = Object.keys(global.transports || {}).length;
      if (remaining > 0) {
        logger.warn('Some MCP sessions could not be closed gracefully', {
          remainingSessions: remaining,
          category: 'shutdown',
        });
      } else {
        logger.info('All MCP sessions closed', {
          category: 'shutdown',
        });
      }
    } catch (error) {
      logger.error('Error closing MCP sessions', {
        error: error instanceof Error ? error.message : String(error),
        category: 'shutdown',
      });
    }
  }

  /**
   * Phase 4: Close external connections
   */
  private async closeExternalConnections(timeout: number): Promise<void> {
    const closeRedis = async () => {
      try {
        await redisClient.quit();
        logger.info('Redis connection closed', {
          category: 'shutdown',
        });
      } catch (error) {
        logger.error('Failed to close Redis connection', {
          error: error instanceof Error ? error.message : String(error),
          category: 'shutdown',
        });
      }
    };

    const stopJwksRefresh = () => {
      try {
        jwksService.stop();
        logger.info('JWKS refresh timer stopped', {
          category: 'shutdown',
        });
      } catch (error) {
        logger.error('Failed to stop JWKS refresh', {
          error: error instanceof Error ? error.message : String(error),
          category: 'shutdown',
        });
      }
    };

    await Promise.race([
      Promise.all([
        closeRedis(),
        Promise.resolve(stopJwksRefresh()),
      ]),
      new Promise(resolve => setTimeout(resolve, timeout)),
    ]);
  }

  /**
   * Phase 5: Finalize and flush logs
   */
  private async finalize(): Promise<void> {
    // Flush Winston logs
    await new Promise<void>((resolve) => {
      logger.on('finish', resolve);
      logger.end();
      setTimeout(resolve, 1000); // Don't wait forever
    });

    logger.info('Log flush complete', {
      category: 'shutdown',
    });
  }

  /**
   * Get count of active HTTP connections
   */
  private getActiveConnections(): number {
    // Node.js doesn't expose this easily, need to track manually
    // For now, return 0 after server.close() is called
    return 0;
  }

  /**
   * Check if currently shutting down
   */
  isShutdownInProgress(): boolean {
    return this.isShuttingDown;
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294

2. Update JWKS Service

Add stop() method to src/services/jwks.ts:

typescript

export class JWKSService {
  private refreshTimer?: NodeJS.Timeout;

  // ... existing methods

  /**
   * Stop background refresh timer
   */
  stop(): void {
    if (this.refreshTimer) {
      clearTimeout(this.refreshTimer);
      this.refreshTimer = undefined;
      logger.info('JWKS refresh timer stopped', {
        category: 'jwks',
      });
    }
  }
}

3. Update Health Check

Fail health check during shutdown in src/routes/health.ts:

typescript

import { gracefulShutdown } from '../services/graceful-shutdown.js';

healthRouter.get("/", (_req, res) => {
  if (gracefulShutdown.isShutdownInProgress()) {
    return res.status(503).json({
      status: "shutting_down",
      version: config.server.version,
    });
  }

  res.json({
    status: "ok",
    version: config.server.version,
  });
});

4. Initialize in Main Application

Update src/index.ts:

typescript

import { app } from "./app.js";
import { config } from "./config/index.js";
import { logger } from "./services/logger.js";
import { GracefulShutdown } from "./services/graceful-shutdown.js";

const server = app.listen(config.port, () => {
  logger.info(`Seed MCP server running on http://localhost:${String(config.port)}/mcp`);

  if (!config.authRequired) {
    logger.warn(
      "⚠️  SECURITY WARNING: Authentication is DISABLED (AUTH_REQUIRED=false). " +
        "This should only be used in development/testing environments.",
    );
  } else {
    logger.info(`Authentication enabled, OIDC issuer: ${config.oidc.issuer || "(not configured)"}`);
  }
});

// Initialize graceful shutdown
const gracefulShutdown = new GracefulShutdown(server, {
  timeout: parseInt(process.env.SHUTDOWN_TIMEOUT || "25000", 10),
  signals: ['SIGTERM', 'SIGINT'],
});

gracefulShutdown.initialize();

// Export for health check
export { gracefulShutdown };

Kubernetes Integration

Pod Spec with Termination Grace Period

yaml

apiVersion: v1
kind: Pod
metadata:
  name: seed-mcp-server
spec:
  containers:
  - name: seed
    image: seed-mcp-server:latest
    ports:
    - containerPort: 3000
    env:
    - name: SHUTDOWN_TIMEOUT
      value: "25000"  # 25 seconds

  # Kubernetes default is 30 seconds
  # Set to 30s to give app 25s + 5s buffer
  terminationGracePeriodSeconds: 30

  # Liveness probe - check if app is alive
  livenessProbe:
    httpGet:
      path: /health
      port: 3000
    initialDelaySeconds: 10
    periodSeconds: 10
    failureThreshold: 3

  # Readiness probe - check if app can serve traffic
  readinessProbe:
    httpGet:
      path: /health/ready
      port: 3000
    initialDelaySeconds: 5
    periodSeconds: 5
    failureThreshold: 3

Shutdown Timeline

Configuration

Environment Variables

bash

# Graceful shutdown timeout (default: 25000ms / 25s)
SHUTDOWN_TIMEOUT=25000

# Kubernetes termination grace period should be higher
# Set in pod spec: terminationGracePeriodSeconds: 30

Best Practice: SHUTDOWN_TIMEOUT should be 5 seconds less than terminationGracePeriodSeconds to allow buffer for cleanup.

Testing

Manual Testing

bash

# Test SIGTERM handling
kill -TERM <pid>

# Test SIGINT handling (Ctrl+C)
npm start
# Press Ctrl+C

# Check logs for shutdown sequence
tail -f logs/app.log | grep shutdown

Unit Tests

typescript

describe('Graceful Shutdown', () => {
  it('should close all MCP sessions on shutdown', async () => {
    // Create test sessions
    const session1 = await createTestSession();
    const session2 = await createTestSession();

    // Trigger shutdown
    process.emit('SIGTERM');

    // Wait for shutdown
    await new Promise(resolve => setTimeout(resolve, 1000));

    // Assert sessions closed
    expect(await getTransport(session1)).toBeUndefined();
    expect(await getTransport(session2)).toBeUndefined();
  });

  it('should close Redis connection on shutdown', async () => {
    // Trigger shutdown
    process.emit('SIGTERM');

    // Wait for shutdown
    await new Promise(resolve => setTimeout(resolve, 3000));

    // Assert Redis closed
    expect(redisClient.status).toBe('end');
  });
});

Metrics

typescript

// Add to src/services/metrics.ts
export const shutdownDuration = new Histogram({
  name: 'shutdown_duration_seconds',
  help: 'Time taken for graceful shutdown',
  labelNames: ['phase'], // stop_connections, close_sessions, etc.
  buckets: [0.1, 0.5, 1, 2, 5, 10, 20],
  registers: [register],
});

export const shutdownSessionsClosed = new Counter({
  name: 'shutdown_sessions_closed_total',
  help: 'Number of sessions closed during shutdown',
  registers: [register],
});

Acceptance Criteria

[ ] Handles SIGTERM signal (Docker/Kubernetes)
[ ] Handles SIGINT signal (Ctrl+C)
[ ] Stops accepting new HTTP connections
[ ] Waits for active requests to complete (with timeout)
[ ] Closes all MCP sessions gracefully
[ ] Closes Redis connections properly
[ ] Stops JWKS background refresh timer
[ ] Flushes pending logs
[ ] Respects configurable shutdown timeout
[ ] Health check returns 503 during shutdown
[ ] Comprehensive logging of shutdown phases
[ ] Metrics for shutdown duration
[ ] Unit tests with >90% coverage
[ ] Integration tests with real dependencies
[ ] Kubernetes pod spec example in docs

Docker Considerations

Dockerfile Best Practices

dockerfile

# Use proper init system to handle signals
# Option 1: Use tini
FROM node:20-alpine
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]

# Option 2: Use dumb-init
FROM node:20-alpine
RUN apk add --no-cache dumb-init
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["node", "dist/index.js"]

# Option 3: Use --init flag
# docker run --init seed-mcp-server

Docker Compose

yaml

version: '3.8'
services:
  seed:
    image: seed-mcp-server:latest
    init: true  # Use Docker's built-in init
    stop_grace_period: 30s  # Wait 30s before SIGKILL
    environment:
      - SHUTDOWN_TIMEOUT=25000
    depends_on:
      - redis

Edge Cases

1. Shutdown During Token Refresh

Scenario: Token refresh in progress when shutdown triggered

Behavior:

Allow refresh to complete (up to 2s)
If timeout exceeded, abort refresh
Session cleanup removes partially refreshed tokens

2. Long-Running MCP Tool

Scenario: MCP tool taking 30 seconds to complete

Behavior:

Wait up to 5 seconds for tool completion
Force-close session after timeout
Tool execution aborted mid-stream
Client receives disconnection

Recommendation: Set tool timeout < shutdown timeout

3. Redis Connection Already Closed

Scenario: Redis connection lost before shutdown

Behavior:

Detect connection status
Skip close operation if already disconnected
Log as info, not error

4. Multiple Shutdown Signals

Scenario: SIGTERM followed by SIGINT

Behavior:

Ignore subsequent signals
Log "shutdown already in progress"
Continue with original shutdown sequence

Health Check Improvements - Readiness probe support
Session Persistence - Session recovery after restart

Graceful Shutdown ​

Implementation Summary ​

What Was Implemented ​

Actual Implementation ​

Key Files Modified ​

Testing ​

Original Problem Statement ​

Current Behavior ​

Proposed Solution ​

Implementation ​

1. Graceful Shutdown Service ​

2. Update JWKS Service ​

3. Update Health Check ​

4. Initialize in Main Application ​

Kubernetes Integration ​

Pod Spec with Termination Grace Period ​

Shutdown Timeline ​

Configuration ​

Environment Variables ​

Testing ​

Manual Testing ​

Unit Tests ​

Metrics ​

Acceptance Criteria ​

Docker Considerations ​

Dockerfile Best Practices ​

Docker Compose ​

Edge Cases ​

1. Shutdown During Token Refresh ​

2. Long-Running MCP Tool ​

3. Redis Connection Already Closed ​

4. Multiple Shutdown Signals ​

Related Enhancements ​

References ​

Graceful Shutdown

Implementation Summary

What Was Implemented

Actual Implementation

Key Files Modified

Testing

Original Problem Statement

Current Behavior

Proposed Solution

Implementation

1. Graceful Shutdown Service

2. Update JWKS Service

3. Update Health Check

4. Initialize in Main Application

Kubernetes Integration

Pod Spec with Termination Grace Period

Shutdown Timeline

Configuration

Environment Variables

Testing

Manual Testing

Unit Tests

Metrics

Acceptance Criteria

Docker Considerations

Dockerfile Best Practices

Docker Compose

Edge Cases

1. Shutdown During Token Refresh

2. Long-Running MCP Tool

3. Redis Connection Already Closed

4. Multiple Shutdown Signals

Related Enhancements

References