Skip to content

Production Operations Guide

Comprehensive guide for running Seed MCP Server in production, covering security, Redis, logging, monitoring, and troubleshooting.

Table of Contents

Security Hardening

Authentication Security

1. Always Use HTTPS in Production

nginx
# Nginx configuration
server {
    listen 443 ssl http2;
    server_name mcp.example.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

2. Short Token Lifetimes

Configure your OIDC provider for short-lived tokens:

  • Access tokens: 1 hour or less
  • Refresh tokens: 24 hours with rotation
  • Session TTL: Match token lifetime
bash
# In Seed configuration
REDIS_SESSION_TTL=3600  # 1 hour in seconds

3. Token Validation

Seed automatically validates:

  • JWT signature using JWKS
  • Token expiration (exp claim)
  • Token issuer (iss claim)
  • Token audience (aud claim)
  • Not-before time (nbf claim)

Ensure these environment variables are set:

bash
OIDC_ISSUER=https://your-idp.com/
OIDC_AUDIENCE=your-client-id

Network Security

1. Firewall Rules

Allow only necessary traffic:

bash
# Allow HTTPS from anywhere
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Allow Redis only from localhost or internal network
iptables -A INPUT -p tcp --dport 6379 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 6379 -j DROP

# Drop all other traffic
iptables -P INPUT DROP

2. IP Allowlisting

Restrict access to known IP ranges:

typescript
// src/middleware/ip-allowlist.ts
import {Request, Response, NextFunction} from "express";

const allowedIPs = process.env.ALLOWED_IPS?.split(",") || [];
const allowedCIDRs = process.env.ALLOWED_CIDRS?.split(",") || [];

export function ipAllowlistMiddleware(req: Request, res: Response, next: NextFunction) {
  const clientIP = req.ip || req.socket.remoteAddress;

  if (isAllowed(clientIP, allowedIPs, allowedCIDRs)) {
    return next();
  }

  res.status(403).json({
    jsonrpc: "2.0",
    error: {
      code: -32000,
      message: "Forbidden",
      data: {reason: "ip_not_allowed"},
    },
    id: null,
  });
}

3. Rate Limiting

Protect against abuse with rate limiting:

typescript
// src/middleware/rate-limit.ts
import rateLimit from "express-rate-limit";

export const rateLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // 100 requests per window
  message: {
    jsonrpc: "2.0",
    error: {
      code: -32000,
      message: "Too many requests",
      data: {reason: "rate_limit_exceeded"},
    },
    id: null,
  },
  standardHeaders: true,
  legacyHeaders: false,
});

// Apply globally or per-route
app.use("/mcp", rateLimiter);

Data Security

1. Redis Security

Protect Redis with authentication:

bash
# Redis configuration (redis.conf)
requirepass YOUR_STRONG_PASSWORD
rename-command CONFIG ""
rename-command FLUSHALL ""
rename-command FLUSHDB ""
bash
# Seed configuration
REDIS_URL=redis://:YOUR_STRONG_PASSWORD@localhost:6379

2. Sensitive Data Handling

Never log sensitive data:

typescript
// Bad
console.log(`Token: ${token}`);

// Good
console.log(`Token received: ${token.substring(0, 10)}...`);

Sanitize errors before sending to clients:

typescript
try {
  // Operation
} catch (error) {
  // Log full error server-side
  logger.error("Operation failed", {error: error.stack});

  // Send generic error to client
  res.status(500).json({
    jsonrpc: "2.0",
    error: {
      code: -32603,
      message: "Internal server error",
    },
    id: null,
  });
}

3. Environment Variable Security

Never commit .env files:

bash
# .gitignore
.env
.env.*
!.env.example

Use secrets management in production:

  • AWS Secrets Manager
  • Azure Key Vault
  • HashiCorp Vault
  • Kubernetes Secrets

Security Headers

Add security headers to all responses:

typescript
import helmet from "helmet";

app.use(
  helmet({
    contentSecurityPolicy: {
      directives: {
        defaultSrc: ["'self'"],
        scriptSrc: ["'self'"],
        styleSrc: ["'self'", "'unsafe-inline'"],
        imgSrc: ["'self'", "data:", "https:"],
      },
    },
    hsts: {
      maxAge: 31536000,
      includeSubDomains: true,
      preload: true,
    },
  })
);

Audit Logging

Log all authentication events:

typescript
// Log successful authentications
logger.info("User authenticated", {
  userId: user.sub,
  email: user.email,
  timestamp: new Date().toISOString(),
  ip: req.ip,
});

// Log failed authentications
logger.warn("Authentication failed", {
  reason: "invalid_token",
  ip: req.ip,
  timestamp: new Date().toISOString(),
});

Redis Configuration

Production Redis Setup

1. Basic Configuration

properties
# redis.conf

# Network
bind 127.0.0.1
port 6379
protected-mode yes

# Security
requirepass YOUR_STRONG_PASSWORD

# Persistence
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis

# Memory
maxmemory 256mb
maxmemory-policy allkeys-lru

# Logging
loglevel notice
logfile /var/log/redis/redis-server.log

2. Connection Pooling

Use connection pooling for better performance:

typescript
// src/services/redis.ts
import {createClient} from "redis";

const client = createClient({
  url: process.env.REDIS_URL,
  socket: {
    reconnectStrategy: (retries) => {
      if (retries > 10) {
        return new Error("Max retries reached");
      }
      return Math.min(retries * 100, 3000);
    },
  },
});

client.on("error", (err) => {
  logger.error("Redis connection error", {error: err});
});

client.on("reconnecting", () => {
  logger.warn("Redis reconnecting");
});

client.on("ready", () => {
  logger.info("Redis connection established");
});

3. Session Management

Configure session storage:

typescript
// Session key pattern
const sessionKey = `seed:session:${sessionId}`;

// Store session with TTL
await redis.set(sessionKey, JSON.stringify(sessionData), {
  EX: 3600, // 1 hour
});

// Get session
const data = await redis.get(sessionKey);
const session = data ? JSON.parse(data) : null;

// Delete session
await redis.del(sessionKey);

// Clean up expired sessions (automatic with TTL)

4. Redis Sentinel for High Availability

typescript
import {createClient} from "redis";

const client = createClient({
  sentinels: [
    {host: "sentinel1.example.com", port: 26379},
    {host: "sentinel2.example.com", port: 26379},
    {host: "sentinel3.example.com", port: 26379},
  ],
  name: "mymaster",
  password: process.env.REDIS_PASSWORD,
});

5. Redis Cluster for Scale

typescript
import {createCluster} from "redis";

const cluster = createCluster({
  rootNodes: [
    {url: "redis://node1.example.com:6379"},
    {url: "redis://node2.example.com:6379"},
    {url: "redis://node3.example.com:6379"},
  ],
  defaults: {
    password: process.env.REDIS_PASSWORD,
  },
});

Redis Monitoring

Monitor these Redis metrics:

bash
# Memory usage
redis-cli INFO memory

# Connected clients
redis-cli INFO clients

# Commands processed
redis-cli INFO stats

# Keyspace information
redis-cli INFO keyspace

# Real-time monitoring
redis-cli MONITOR

Set up alerts for:

  • Memory usage > 80%
  • Connected clients > 1000
  • Command execution time > 100ms
  • Rejected connections > 0

Logging & Debugging

Structured Logging

Use structured logging for better observability:

typescript
// src/utils/logger.ts
import winston from "winston";

export const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || "info",
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({stack: true}),
    winston.format.json()
  ),
  defaultMeta: {service: "seed-mcp-server"},
  transports: [
    new winston.transports.File({filename: "error.log", level: "error"}),
    new winston.transports.File({filename: "combined.log"}),
  ],
});

// Add console transport in development
if (process.env.NODE_ENV !== "production") {
  logger.add(
    new winston.transports.Console({
      format: winston.format.combine(winston.format.colorize(), winston.format.simple()),
    })
  );
}

Log Levels

Use appropriate log levels:

typescript
// ERROR - System errors requiring immediate attention
logger.error("Redis connection failed", {error: err.message, stack: err.stack});

// WARN - Warning conditions that should be investigated
logger.warn("JWKS fetch took longer than expected", {duration: 5000});

// INFO - Normal operational events
logger.info("MCP session created", {sessionId, userId: user.sub});

// DEBUG - Detailed information for debugging
logger.debug("JWT claims extracted", {claims: payload});

Request Logging

Log all incoming requests:

typescript
import morgan from "morgan";

// Combine Morgan with Winston
const stream = {
  write: (message: string) => logger.http(message.trim()),
};

app.use(
  morgan(":method :url :status :res[content-length] - :response-time ms", {
    stream,
  })
);

Correlation IDs

Track requests across services:

typescript
import {v4 as uuidv4} from "uuid";

app.use((req, res, next) => {
  req.correlationId = req.headers["x-correlation-id"] || uuidv4();
  res.setHeader("X-Correlation-ID", req.correlationId);
  next();
});

// Use in logs
logger.info("Request processed", {
  correlationId: req.correlationId,
  method: req.method,
  path: req.path,
});

Debug Mode

Enable detailed debugging:

bash
# Debug all
DEBUG=* npm start

# Debug specific modules
DEBUG=seed:* npm start
DEBUG=seed:auth,seed:mcp npm start
typescript
import debug from "debug";

const log = debug("seed:auth");

export function authMiddleware(req, res, next) {
  log("Processing authentication for %s %s", req.method, req.path);
  // ...
}

Error Tracking

Integrate error tracking service:

typescript
import * as Sentry from "@sentry/node";

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1,
});

// Capture all errors
app.use(Sentry.Handlers.errorHandler());

// Manual error capture
try {
  // Operation
} catch (error) {
  Sentry.captureException(error);
  throw error;
}

Monitoring

Health Check Endpoint

Monitor server health:

bash
# Basic health check
curl https://mcp.example.com/health

# Expected response
{
  "status": "ok",
  "timestamp": "2025-01-05T12:00:00.000Z",
  "uptime": 3600.5,
  "redis": {
    "connected": true,
    "ping": "PONG"
  }
}

Key Metrics to Track

Application Metrics:

  • Request rate (requests/second)
  • Response time (p50, p95, p99)
  • Error rate (percentage)
  • Active MCP sessions
  • Tool invocation rate

System Metrics:

  • CPU usage
  • Memory usage
  • Disk I/O
  • Network I/O

Redis Metrics:

  • Memory usage
  • Connected clients
  • Commands processed/second
  • Key count
  • Hit rate

Authentication Metrics:

  • OAuth success rate
  • JWT validation time
  • JWKS fetch latency
  • Token refresh rate

Prometheus Integration

Seed includes built-in Prometheus metrics support.

Enabling Metrics

Metrics are enabled by default. To disable:

bash
# Disable metrics collection
METRICS_ENABLED=false

When enabled, metrics are exposed at /metrics endpoint in Prometheus format.

Available Metrics

HTTP Metrics:

  • http_requests_total - Total HTTP requests by method, route, and status
  • http_request_duration_seconds - Request duration histogram

MCP Metrics:

  • mcp_sessions_active - Current active MCP sessions
  • mcp_sessions_total - Total MCP sessions created
  • mcp_tool_invocations_total - Tool invocations by tool name and result
  • mcp_tool_duration_seconds - Tool execution duration

Authentication Metrics:

  • auth_attempts_total - Authentication attempts by result (success/failure)
  • auth_token_validation_duration_seconds - Token validation duration
  • jwks_refresh_total - JWKS refresh operations by result
  • jwks_cache_hits_total - JWKS cache hits
  • jwks_cache_misses_total - JWKS cache misses

Redis Metrics:

  • redis_operations_total - Redis operations by operation type and result
  • redis_operation_duration_seconds - Redis operation duration

Rate Limiting Metrics:

  • rate_limit_hits_total - Rate limit hits by endpoint

System Metrics:

  • process_cpu_seconds_total - CPU usage
  • process_resident_memory_bytes - Memory usage
  • nodejs_eventloop_lag_seconds - Event loop lag
  • nodejs_heap_size_total_bytes - Heap size
  • nodejs_heap_size_used_bytes - Heap used

Securing the Metrics Endpoint

Production Security

The /metrics endpoint is publicly accessible when metrics are enabled. Always restrict access in production.

Option 1: Disable metrics entirely

bash
METRICS_ENABLED=false

Option 2: IP Whitelisting via Traefik

yaml
# docker-stack.yml
labels:
  - "traefik.http.middlewares.metrics-ipwhitelist.ipwhitelist.sourcerange=10.0.0.0/8,172.16.0.0/12"
  - "traefik.http.routers.seed-metrics.middlewares=metrics-ipwhitelist"

Option 3: Internal network only

yaml
# docker-stack.yml
- "traefik.http.services.seed.loadbalancer.server.port=3000"
# Don't expose /metrics route to external Traefik

Option 4: HTTP BasicAuth via reverse proxy

nginx
location /metrics {
    auth_basic "Metrics";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://seed:3000;
}

See Production Deployment Guide for detailed examples.

Grafana Dashboard

A pre-built Grafana dashboard is available at grafana/seed-mcp-server-dashboard.json. Import it to visualize all metrics:

bash
# Import via Grafana UI
Dashboard Import Upload JSON file

# Or via API
curl -X POST -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d @grafana/seed-mcp-server-dashboard.json \
  http://grafana:3000/api/dashboards/db

Alerting Rules

Set up alerts for critical conditions:

yaml
# Prometheus alert rules
groups:
  - name: seed-mcp-server
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        annotations:
          summary: "High error rate detected"

      - alert: SlowResponses
        expr: histogram_quantile(0.95, http_request_duration_seconds) > 2
        for: 10m
        annotations:
          summary: "95th percentile response time > 2s"

      - alert: RedisMemoryHigh
        expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
        for: 5m
        annotations:
          summary: "Redis memory usage > 80%"

Performance Tuning

Node.js Optimization

bash
# Increase memory limit
NODE_OPTIONS="--max-old-space-size=4096" npm start

# Enable cluster mode
PM2_INSTANCES=4 pm2 start dist/index.js

JWKS Caching

Reduce identity provider calls:

typescript
// Increase cache TTL
jwks: {
  cacheTtlMs: 3600000,  // 1 hour (default)
  refreshBeforeExpiryMs: 300000,  // 5 minutes
}

// Or set via environment
JWKS_CACHE_TTL=7200000  # 2 hours

Connection Pooling

Optimize database connections:

typescript
// Redis connection pool
const pool = new Pool({
  min: 2,
  max: 10,
  idleTimeoutMillis: 30000,
});

Response Compression

Enable gzip compression:

typescript
import compression from "compression";

app.use(compression());

Load Balancing

Deploy multiple instances:

nginx
upstream seed_backend {
    least_conn;
    server 127.0.0.1:3000;
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}

server {
    location / {
        proxy_pass http://seed_backend;
    }
}

Backup & Recovery

Redis Backup

bash
# Manual backup
redis-cli BGSAVE

# Automated backup script
#!/bin/bash
BACKUP_DIR=/backups/redis
DATE=$(date +%Y%m%d_%H%M%S)
redis-cli BGSAVE
cp /var/lib/redis/dump.rdb $BACKUP_DIR/dump_$DATE.rdb

# Keep only last 7 days
find $BACKUP_DIR -name "dump_*.rdb" -mtime +7 -delete

Configuration Backup

bash
# Backup environment configuration
cp .env .env.backup.$(date +%Y%m%d)

# Backup Redis configuration
cp /etc/redis/redis.conf /backups/redis.conf.$(date +%Y%m%d)

Disaster Recovery

Recovery Steps:

  1. Stop Seed server
  2. Restore Redis from backup:
    bash
    redis-cli SHUTDOWN
    cp /backups/redis/dump_20250105.rdb /var/lib/redis/dump.rdb
    redis-server /etc/redis/redis.conf
  3. Verify Redis data
  4. Restart Seed server
  5. Test connections

Troubleshooting

See Troubleshooting Guide for common issues.

Production-Specific Issues

High Memory Usage

bash
# Check memory usage
ps aux | grep node
redis-cli INFO memory

# Enable heap snapshots
node --heapsnapshot-signal=SIGUSR2 dist/index.js

# Trigger snapshot
kill -USR2 <PID>

# Analyze with Chrome DevTools

Memory Leaks

typescript
// Enable heap profiling
import v8 from "v8";
import fs from "fs";

function takeHeapSnapshot() {
  const filename = `heap-${Date.now()}.heapsnapshot`;
  const stream = v8.writeHeapSnapshot(filename);
  console.log(`Heap snapshot written to ${filename}`);
}

// Trigger on SIGUSR2
process.on("SIGUSR2", takeHeapSnapshot);

Connection Pool Exhaustion

bash
# Check Redis connections
redis-cli CLIENT LIST

# Monitor connection count
watch -n 1 'redis-cli CLIENT LIST | wc -l'

Solutions:

  • Increase pool size
  • Reduce connection timeout
  • Find and fix connection leaks

Slow Queries

bash
# Enable slow log in Redis
redis-cli CONFIG SET slowlog-log-slower-than 10000  # 10ms
redis-cli CONFIG SET slowlog-max-len 128

# View slow queries
redis-cli SLOWLOG GET 10

Security Checklist

  • [ ] HTTPS enabled with valid certificates
  • [ ] OAuth endpoints configured correctly
  • [ ] Token lifetimes set appropriately (≤1 hour)
  • [ ] Redis authentication enabled
  • [ ] Firewall rules configured
  • [ ] Rate limiting enabled
  • [ ] IP allowlisting configured (if needed)
  • [ ] Security headers enabled
  • [ ] Audit logging enabled
  • [ ] Sensitive data sanitized from logs
  • [ ] Environment variables secured
  • [ ] Regular security updates applied

Performance Checklist

  • [ ] JWKS caching optimized
  • [ ] Redis connection pooling configured
  • [ ] Response compression enabled
  • [ ] Load balancing implemented
  • [ ] Monitoring and alerting set up
  • [ ] Log retention configured
  • [ ] Backup automation configured
  • [ ] Resource limits set appropriately

Monitoring Checklist

  • [ ] Health check endpoint monitored
  • [ ] Application metrics collected
  • [ ] System metrics tracked
  • [ ] Redis metrics monitored
  • [ ] Authentication metrics tracked
  • [ ] Alerting rules configured
  • [ ] Dashboards created
  • [ ] On-call rotation established

Released under the MIT License.