Production Operations Guide

Comprehensive guide for running Seed MCP Server in production, covering security, Redis, logging, monitoring, and troubleshooting.

Security Hardening
Redis Configuration
Logging & Debugging
Monitoring
Performance Tuning
Backup & Recovery
Troubleshooting

Security Hardening

Authentication Security

1. Always Use HTTPS in Production

nginx

# Nginx configuration
server {
    listen 443 ssl http2;
    server_name mcp.example.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

2. Short Token Lifetimes

Configure your OIDC provider for short-lived tokens:

Access tokens: 1 hour or less
Refresh tokens: 24 hours with rotation
Session TTL: Match token lifetime

bash

# In Seed configuration
REDIS_SESSION_TTL=3600  # 1 hour in seconds

3. Token Validation

Seed automatically validates:

JWT signature using JWKS
Token expiration (exp claim)
Token issuer (iss claim)
Token audience (aud claim)
Not-before time (nbf claim)

Ensure these environment variables are set:

bash

OIDC_ISSUER=https://your-idp.com/
OIDC_AUDIENCE=your-client-id

Network Security

1. Firewall Rules

Allow only necessary traffic:

bash

# Allow HTTPS from anywhere
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Allow Redis only from localhost or internal network
iptables -A INPUT -p tcp --dport 6379 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 6379 -j DROP

# Drop all other traffic
iptables -P INPUT DROP

2. IP Allowlisting

Restrict access to known IP ranges:

typescript

// src/middleware/ip-allowlist.ts
import {Request, Response, NextFunction} from "express";

const allowedIPs = process.env.ALLOWED_IPS?.split(",") || [];
const allowedCIDRs = process.env.ALLOWED_CIDRS?.split(",") || [];

export function ipAllowlistMiddleware(req: Request, res: Response, next: NextFunction) {
  const clientIP = req.ip || req.socket.remoteAddress;

  if (isAllowed(clientIP, allowedIPs, allowedCIDRs)) {
    return next();
  }

  res.status(403).json({
    jsonrpc: "2.0",
    error: {
      code: -32000,
      message: "Forbidden",
      data: {reason: "ip_not_allowed"},
    },
    id: null,
  });
}

3. Rate Limiting

Protect against abuse with rate limiting:

typescript

// src/middleware/rate-limit.ts
import rateLimit from "express-rate-limit";

export const rateLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // 100 requests per window
  message: {
    jsonrpc: "2.0",
    error: {
      code: -32000,
      message: "Too many requests",
      data: {reason: "rate_limit_exceeded"},
    },
    id: null,
  },
  standardHeaders: true,
  legacyHeaders: false,
});

// Apply globally or per-route
app.use("/mcp", rateLimiter);

Data Security

1. Redis Security

Protect Redis with authentication:

bash

# Redis configuration (redis.conf)
requirepass YOUR_STRONG_PASSWORD
rename-command CONFIG ""
rename-command FLUSHALL ""
rename-command FLUSHDB ""

bash

# Seed configuration
REDIS_URL=redis://:YOUR_STRONG_PASSWORD@localhost:6379

2. Sensitive Data Handling

Never log sensitive data:

typescript

// Bad
console.log(`Token: ${token}`);

// Good
console.log(`Token received: ${token.substring(0, 10)}...`);

Sanitize errors before sending to clients:

typescript

try {
  // Operation
} catch (error) {
  // Log full error server-side
  logger.error("Operation failed", {error: error.stack});

  // Send generic error to client
  res.status(500).json({
    jsonrpc: "2.0",
    error: {
      code: -32603,
      message: "Internal server error",
    },
    id: null,
  });
}

3. Environment Variable Security

Never commit .env files:

bash

# .gitignore
.env
.env.*
!.env.example

Use secrets management in production:

AWS Secrets Manager
Azure Key Vault
HashiCorp Vault
Kubernetes Secrets

Security Headers

Add security headers to all responses:

typescript

import helmet from "helmet";

app.use(
  helmet({
    contentSecurityPolicy: {
      directives: {
        defaultSrc: ["'self'"],
        scriptSrc: ["'self'"],
        styleSrc: ["'self'", "'unsafe-inline'"],
        imgSrc: ["'self'", "data:", "https:"],
      },
    },
    hsts: {
      maxAge: 31536000,
      includeSubDomains: true,
      preload: true,
    },
  })
);

Audit Logging

Log all authentication events:

typescript

// Log successful authentications
logger.info("User authenticated", {
  userId: user.sub,
  email: user.email,
  timestamp: new Date().toISOString(),
  ip: req.ip,
});

// Log failed authentications
logger.warn("Authentication failed", {
  reason: "invalid_token",
  ip: req.ip,
  timestamp: new Date().toISOString(),
});

Redis Configuration

Production Redis Setup

1. Basic Configuration

properties

# redis.conf

# Network
bind 127.0.0.1
port 6379
protected-mode yes

# Security
requirepass YOUR_STRONG_PASSWORD

# Persistence
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis

# Memory
maxmemory 256mb
maxmemory-policy allkeys-lru

# Logging
loglevel notice
logfile /var/log/redis/redis-server.log

2. Connection Pooling

Use connection pooling for better performance:

typescript

// src/services/redis.ts
import {createClient} from "redis";

const client = createClient({
  url: process.env.REDIS_URL,
  socket: {
    reconnectStrategy: (retries) => {
      if (retries > 10) {
        return new Error("Max retries reached");
      }
      return Math.min(retries * 100, 3000);
    },
  },
});

client.on("error", (err) => {
  logger.error("Redis connection error", {error: err});
});

client.on("reconnecting", () => {
  logger.warn("Redis reconnecting");
});

client.on("ready", () => {
  logger.info("Redis connection established");
});

3. Session Management

Configure session storage:

typescript

// Session key pattern
const sessionKey = `seed:session:${sessionId}`;

// Store session with TTL
await redis.set(sessionKey, JSON.stringify(sessionData), {
  EX: 3600, // 1 hour
});

// Get session
const data = await redis.get(sessionKey);
const session = data ? JSON.parse(data) : null;

// Delete session
await redis.del(sessionKey);

// Clean up expired sessions (automatic with TTL)

4. Redis Sentinel for High Availability

typescript

import {createClient} from "redis";

const client = createClient({
  sentinels: [
    {host: "sentinel1.example.com", port: 26379},
    {host: "sentinel2.example.com", port: 26379},
    {host: "sentinel3.example.com", port: 26379},
  ],
  name: "mymaster",
  password: process.env.REDIS_PASSWORD,
});

5. Redis Cluster for Scale

typescript

import {createCluster} from "redis";

const cluster = createCluster({
  rootNodes: [
    {url: "redis://node1.example.com:6379"},
    {url: "redis://node2.example.com:6379"},
    {url: "redis://node3.example.com:6379"},
  ],
  defaults: {
    password: process.env.REDIS_PASSWORD,
  },
});

Redis Monitoring

Monitor these Redis metrics:

bash

# Memory usage
redis-cli INFO memory

# Connected clients
redis-cli INFO clients

# Commands processed
redis-cli INFO stats

# Keyspace information
redis-cli INFO keyspace

# Real-time monitoring
redis-cli MONITOR

Set up alerts for:

Memory usage > 80%
Connected clients > 1000
Command execution time > 100ms
Rejected connections > 0

Logging & Debugging

Structured Logging

Use structured logging for better observability:

typescript

// src/utils/logger.ts
import winston from "winston";

export const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || "info",
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({stack: true}),
    winston.format.json()
  ),
  defaultMeta: {service: "seed-mcp-server"},
  transports: [
    new winston.transports.File({filename: "error.log", level: "error"}),
    new winston.transports.File({filename: "combined.log"}),
  ],
});

// Add console transport in development
if (process.env.NODE_ENV !== "production") {
  logger.add(
    new winston.transports.Console({
      format: winston.format.combine(winston.format.colorize(), winston.format.simple()),
    })
  );
}

Log Levels

Use appropriate log levels:

typescript

// ERROR - System errors requiring immediate attention
logger.error("Redis connection failed", {error: err.message, stack: err.stack});

// WARN - Warning conditions that should be investigated
logger.warn("JWKS fetch took longer than expected", {duration: 5000});

// INFO - Normal operational events
logger.info("MCP session created", {sessionId, userId: user.sub});

// DEBUG - Detailed information for debugging
logger.debug("JWT claims extracted", {claims: payload});

Request Logging

Log all incoming requests:

typescript

import morgan from "morgan";

// Combine Morgan with Winston
const stream = {
  write: (message: string) => logger.http(message.trim()),
};

app.use(
  morgan(":method :url :status :res[content-length] - :response-time ms", {
    stream,
  })
);

Correlation IDs

Track requests across services:

typescript

import {v4 as uuidv4} from "uuid";

app.use((req, res, next) => {
  req.correlationId = req.headers["x-correlation-id"] || uuidv4();
  res.setHeader("X-Correlation-ID", req.correlationId);
  next();
});

// Use in logs
logger.info("Request processed", {
  correlationId: req.correlationId,
  method: req.method,
  path: req.path,
});

Debug Mode

Enable detailed debugging:

bash

# Debug all
DEBUG=* npm start

# Debug specific modules
DEBUG=seed:* npm start
DEBUG=seed:auth,seed:mcp npm start

typescript

import debug from "debug";

const log = debug("seed:auth");

export function authMiddleware(req, res, next) {
  log("Processing authentication for %s %s", req.method, req.path);
  // ...
}

Error Tracking

Integrate error tracking service:

typescript

import * as Sentry from "@sentry/node";

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1,
});

// Capture all errors
app.use(Sentry.Handlers.errorHandler());

// Manual error capture
try {
  // Operation
} catch (error) {
  Sentry.captureException(error);
  throw error;
}

Monitoring

Health Check Endpoint

Monitor server health:

bash

# Basic health check
curl https://mcp.example.com/health

# Expected response
{
  "status": "ok",
  "timestamp": "2025-01-05T12:00:00.000Z",
  "uptime": 3600.5,
  "redis": {
    "connected": true,
    "ping": "PONG"
  }
}

Key Metrics to Track

Application Metrics:

Request rate (requests/second)
Response time (p50, p95, p99)
Error rate (percentage)
Active MCP sessions
Tool invocation rate

System Metrics:

CPU usage
Memory usage
Disk I/O
Network I/O

Redis Metrics:

Memory usage
Connected clients
Commands processed/second
Key count
Hit rate

Authentication Metrics:

OAuth success rate
JWT validation time
JWKS fetch latency
Token refresh rate

Prometheus Integration

Seed includes built-in Prometheus metrics support.

Enabling Metrics

Metrics are enabled by default. To disable:

bash

# Disable metrics collection
METRICS_ENABLED=false

When enabled, metrics are exposed at /metrics endpoint in Prometheus format.

Available Metrics

HTTP Metrics:

http_requests_total - Total HTTP requests by method, route, and status
http_request_duration_seconds - Request duration histogram

MCP Metrics:

mcp_sessions_active - Current active MCP sessions
mcp_sessions_total - Total MCP sessions created
mcp_tool_invocations_total - Tool invocations by tool name and result
mcp_tool_duration_seconds - Tool execution duration

Authentication Metrics:

auth_attempts_total - Authentication attempts by result (success/failure)
auth_token_validation_duration_seconds - Token validation duration
jwks_refresh_total - JWKS refresh operations by result
jwks_cache_hits_total - JWKS cache hits
jwks_cache_misses_total - JWKS cache misses

Redis Metrics:

redis_operations_total - Redis operations by operation type and result
redis_operation_duration_seconds - Redis operation duration

Rate Limiting Metrics:

rate_limit_hits_total - Rate limit hits by endpoint

System Metrics:

process_cpu_seconds_total - CPU usage
process_resident_memory_bytes - Memory usage
nodejs_eventloop_lag_seconds - Event loop lag
nodejs_heap_size_total_bytes - Heap size
nodejs_heap_size_used_bytes - Heap used

Securing the Metrics Endpoint

Production Security

The /metrics endpoint is publicly accessible when metrics are enabled. Always restrict access in production.

Option 1: Disable metrics entirely

bash

METRICS_ENABLED=false

Option 2: IP Whitelisting via Traefik

yaml

# docker-stack.yml
labels:
  - "traefik.http.middlewares.metrics-ipwhitelist.ipwhitelist.sourcerange=10.0.0.0/8,172.16.0.0/12"
  - "traefik.http.routers.seed-metrics.middlewares=metrics-ipwhitelist"

Option 3: Internal network only

yaml

# docker-stack.yml
- "traefik.http.services.seed.loadbalancer.server.port=3000"
# Don't expose /metrics route to external Traefik

Option 4: HTTP BasicAuth via reverse proxy

nginx

location /metrics {
    auth_basic "Metrics";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://seed:3000;
}

See Production Deployment Guide for detailed examples.

Grafana Dashboard

A pre-built Grafana dashboard is available at grafana/seed-mcp-server-dashboard.json. Import it to visualize all metrics:

bash

# Import via Grafana UI
Dashboard → Import → Upload JSON file

# Or via API
curl -X POST -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d @grafana/seed-mcp-server-dashboard.json \
  http://grafana:3000/api/dashboards/db

Alerting Rules

Set up alerts for critical conditions:

yaml

# Prometheus alert rules
groups:
  - name: seed-mcp-server
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        annotations:
          summary: "High error rate detected"

      - alert: SlowResponses
        expr: histogram_quantile(0.95, http_request_duration_seconds) > 2
        for: 10m
        annotations:
          summary: "95th percentile response time > 2s"

      - alert: RedisMemoryHigh
        expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
        for: 5m
        annotations:
          summary: "Redis memory usage > 80%"

Performance Tuning

Node.js Optimization

bash

# Increase memory limit
NODE_OPTIONS="--max-old-space-size=4096" npm start

# Enable cluster mode
PM2_INSTANCES=4 pm2 start dist/index.js

JWKS Caching

Reduce identity provider calls:

typescript

// Increase cache TTL
jwks: {
  cacheTtlMs: 3600000,  // 1 hour (default)
  refreshBeforeExpiryMs: 300000,  // 5 minutes
}

// Or set via environment
JWKS_CACHE_TTL=7200000  # 2 hours

Connection Pooling

Optimize database connections:

typescript

// Redis connection pool
const pool = new Pool({
  min: 2,
  max: 10,
  idleTimeoutMillis: 30000,
});

Response Compression

Enable gzip compression:

typescript

import compression from "compression";

app.use(compression());

Load Balancing

Deploy multiple instances:

nginx

upstream seed_backend {
    least_conn;
    server 127.0.0.1:3000;
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}

server {
    location / {
        proxy_pass http://seed_backend;
    }
}

Backup & Recovery

Redis Backup

bash

# Manual backup
redis-cli BGSAVE

# Automated backup script
#!/bin/bash
BACKUP_DIR=/backups/redis
DATE=$(date +%Y%m%d_%H%M%S)
redis-cli BGSAVE
cp /var/lib/redis/dump.rdb $BACKUP_DIR/dump_$DATE.rdb

# Keep only last 7 days
find $BACKUP_DIR -name "dump_*.rdb" -mtime +7 -delete

Configuration Backup

bash

# Backup environment configuration
cp .env .env.backup.$(date +%Y%m%d)

# Backup Redis configuration
cp /etc/redis/redis.conf /backups/redis.conf.$(date +%Y%m%d)

Disaster Recovery

Recovery Steps:

Stop Seed server

Restore Redis from backup:

bash

redis-cli SHUTDOWN
cp /backups/redis/dump_20250105.rdb /var/lib/redis/dump.rdb
redis-server /etc/redis/redis.conf

Verify Redis data
Restart Seed server
Test connections

Troubleshooting

See Troubleshooting Guide for common issues.

Production-Specific Issues

High Memory Usage

bash

# Check memory usage
ps aux | grep node
redis-cli INFO memory

# Enable heap snapshots
node --heapsnapshot-signal=SIGUSR2 dist/index.js

# Trigger snapshot
kill -USR2 <PID>

# Analyze with Chrome DevTools

Memory Leaks

typescript

// Enable heap profiling
import v8 from "v8";
import fs from "fs";

function takeHeapSnapshot() {
  const filename = `heap-${Date.now()}.heapsnapshot`;
  const stream = v8.writeHeapSnapshot(filename);
  console.log(`Heap snapshot written to ${filename}`);
}

// Trigger on SIGUSR2
process.on("SIGUSR2", takeHeapSnapshot);

Connection Pool Exhaustion

bash

# Check Redis connections
redis-cli CLIENT LIST

# Monitor connection count
watch -n 1 'redis-cli CLIENT LIST | wc -l'

Solutions:

Increase pool size
Reduce connection timeout
Find and fix connection leaks

Slow Queries

bash

# Enable slow log in Redis
redis-cli CONFIG SET slowlog-log-slower-than 10000  # 10ms
redis-cli CONFIG SET slowlog-max-len 128

# View slow queries
redis-cli SLOWLOG GET 10

Security Checklist

[ ] HTTPS enabled with valid certificates
[ ] OAuth endpoints configured correctly
[ ] Token lifetimes set appropriately (≤1 hour)
[ ] Redis authentication enabled
[ ] Firewall rules configured
[ ] Rate limiting enabled
[ ] IP allowlisting configured (if needed)
[ ] Security headers enabled
[ ] Audit logging enabled
[ ] Sensitive data sanitized from logs
[ ] Environment variables secured
[ ] Regular security updates applied

Performance Checklist

[ ] JWKS caching optimized
[ ] Redis connection pooling configured
[ ] Response compression enabled
[ ] Load balancing implemented
[ ] Monitoring and alerting set up
[ ] Log retention configured
[ ] Backup automation configured
[ ] Resource limits set appropriately

Monitoring Checklist

[ ] Health check endpoint monitored
[ ] Application metrics collected
[ ] System metrics tracked
[ ] Redis metrics monitored
[ ] Authentication metrics tracked
[ ] Alerting rules configured
[ ] Dashboards created
[ ] On-call rotation established

Deployment Guide - Deployment strategies
Configuration - Environment variables
Architecture - System design
Troubleshooting - Common issues and solutions
API Reference - API documentation

Production Operations Guide ​

Table of Contents ​

Security Hardening ​

Authentication Security ​

1. Always Use HTTPS in Production ​

2. Short Token Lifetimes ​

3. Token Validation ​

Network Security ​

1. Firewall Rules ​

2. IP Allowlisting ​

3. Rate Limiting ​

Data Security ​

1. Redis Security ​

2. Sensitive Data Handling ​

3. Environment Variable Security ​

Security Headers ​

Audit Logging ​

Redis Configuration ​

Production Redis Setup ​

1. Basic Configuration ​

2. Connection Pooling ​

3. Session Management ​

4. Redis Sentinel for High Availability ​

5. Redis Cluster for Scale ​

Redis Monitoring ​

Logging & Debugging ​

Structured Logging ​

Log Levels ​

Request Logging ​

Correlation IDs ​

Debug Mode ​

Error Tracking ​

Monitoring ​

Health Check Endpoint ​

Key Metrics to Track ​

Prometheus Integration ​

Enabling Metrics ​

Available Metrics ​

Securing the Metrics Endpoint ​

Grafana Dashboard ​

Alerting Rules ​

Performance Tuning ​

Node.js Optimization ​

JWKS Caching ​

Connection Pooling ​

Response Compression ​

Load Balancing ​

Backup & Recovery ​

Redis Backup ​

Configuration Backup ​

Disaster Recovery ​

Troubleshooting ​

Production-Specific Issues ​

High Memory Usage ​

Memory Leaks ​

Connection Pool Exhaustion ​

Slow Queries ​

Security Checklist ​

Performance Checklist ​

Monitoring Checklist ​

Related Documentation ​

Production Operations Guide

Table of Contents

Security Hardening

Authentication Security

1. Always Use HTTPS in Production

2. Short Token Lifetimes

3. Token Validation

Network Security

1. Firewall Rules

2. IP Allowlisting

3. Rate Limiting

Data Security

1. Redis Security

2. Sensitive Data Handling

3. Environment Variable Security

Security Headers

Audit Logging

Redis Configuration

Production Redis Setup

1. Basic Configuration

2. Connection Pooling

3. Session Management

4. Redis Sentinel for High Availability

5. Redis Cluster for Scale

Redis Monitoring

Logging & Debugging

Structured Logging

Log Levels

Request Logging

Correlation IDs

Debug Mode

Error Tracking

Monitoring

Health Check Endpoint

Key Metrics to Track

Prometheus Integration

Enabling Metrics

Available Metrics

Securing the Metrics Endpoint

Grafana Dashboard

Alerting Rules

Performance Tuning

Node.js Optimization

JWKS Caching

Connection Pooling

Response Compression

Load Balancing

Backup & Recovery

Redis Backup

Configuration Backup

Disaster Recovery

Troubleshooting

Production-Specific Issues

High Memory Usage

Memory Leaks

Connection Pool Exhaustion

Slow Queries

Security Checklist

Performance Checklist

Monitoring Checklist

Related Documentation