Skip to content

Production Deployment Guide

Complete guide to deploying Seed MCP Server to production with Okta OIDC authentication, Docker Swarm, and secure metrics endpoint.

Overview

This guide covers:

  1. Okta OIDC Setup
  2. Docker Swarm Deployment
  3. Securing the Metrics Endpoint
  4. Monitoring and Observability
  5. Troubleshooting

Production Architecture

Production-Ready Features (as of 2026-01-06):

  • Graceful shutdown with SIGTERM/SIGINT handling
  • Liveness and readiness probes for Kubernetes
  • Configuration validation at startup
  • Redis connection resilience with circuit breaker
  • Token revocation support (RFC 7009)

Prerequisites

  • Docker Swarm cluster initialized
  • Traefik reverse proxy deployed
  • Domain with DNS configured
  • Okta account with admin access
  • Redis instance (included in stack)

Okta OIDC Setup

Step 1: Create Application in Okta

  1. Log into Okta Admin Console

    • Navigate to ApplicationsApplications
    • Click Create App Integration
  2. Select Integration Type

    • Sign-in method: OIDC - OpenID Connect
    • Application type: Web Application
    • Click Next
  3. Configure Application Settings

    General Settings:

    • App integration name: Seed MCP Server
    • Logo: (optional)

    Grant types:

    • ✅ Authorization Code
    • ✅ Refresh Token
    • ✅ Implicit (hybrid) - if needed

    Sign-in redirect URIs:

    http://localhost:*/callback
    https://seed.yourdomain.com/oauth/authorize/callback

    Sign-out redirect URIs: (optional)

    https://seed.yourdomain.com

    Controlled access:

    • Select who can access this application
    • Recommended: Limit access to selected groups
  4. Save Application

    • Click Save
    • You'll be redirected to the application page

Step 2: Note Application Credentials

On the application page, note these values:

bash
# Client ID (under "Client Credentials")
OIDC_AUDIENCE=0oa1abc2def3ghi4jkl5

# Okta Domain (from URL or Dashboard)
OKTA_DOMAIN=dev-12345678.okta.com
# or custom domain: auth.yourdomain.com

# Construct these URLs:
OIDC_ISSUER=https://${OKTA_DOMAIN}/oauth2/default
OAUTH_TOKEN_URL=https://${OKTA_DOMAIN}/oauth2/default/v1/token
OAUTH_AUTHORIZATION_URL=https://${OKTA_DOMAIN}/oauth2/default/v1/authorize

Using Default Authorization Server

The /oauth2/default path refers to Okta's default authorization server. For custom authorization servers, replace default with your server ID.

Step 3: Configure Token Settings

  1. Navigate to SecurityAPIAuthorization Servers

  2. Click on default (or your custom server)

  3. Go to Settings tab

    Configure:

    • Issuer: Should be https://${OKTA_DOMAIN}/oauth2/default
    • Audience: api://default (or custom)
  4. Go to Access Policies tab

    • Ensure there's a policy allowing your application
    • Default policy: Default Policy Rule allows all clients
  5. Go to Scopes tab

    • Verify these scopes exist:
      • openid (required)
      • profile (recommended)
      • email (recommended)
      • offline_access (required for refresh tokens)

Step 4: Configure PKCE Settings

  1. In your application settings, go to General tab
  2. Scroll to General SettingsEdit
  3. Proof Key for Code Exchange (PKCE):
    • Select: Require PKCE as additional verification
    • This is required for public clients (Claude Desktop/Code)

Step 5: Assign Users/Groups

  1. Go to Assignments tab in your application
  2. Click AssignAssign to People or Assign to Groups
  3. Select users/groups who should have access
  4. Click AssignDone

Step 6: Test OIDC Configuration

You can test your Okta configuration using the .well-known endpoint:

bash
curl https://${OKTA_DOMAIN}/oauth2/default/.well-known/openid-configuration

Expected response includes:

json
{
  "issuer": "https://dev-12345678.okta.com/oauth2/default",
  "authorization_endpoint": "https://dev-12345678.okta.com/oauth2/default/v1/authorize",
  "token_endpoint": "https://dev-12345678.okta.com/oauth2/default/v1/token",
  "jwks_uri": "https://dev-12345678.okta.com/oauth2/default/v1/keys",
  ...
}

Docker Swarm Deployment

Step 1: Prepare Docker Stack File

The repository includes docker-stack.production.yml. Review and customize:

yaml
services:
  seed:
    image: containers.home/mcp-servers/seed:latest
    networks:
      - traefik-public
      - seed-internal
    deploy:
      labels:
        # Update with your domain
        - traefik.http.routers.seed.rule=Host(`seed.yourdomain.com`)
        - traefik.http.routers.seed.entrypoints=websecure
        - traefik.http.routers.seed.tls.certresolver=letsencrypt
    environment:
      - NODE_ENV=production
      - AUTH_REQUIRED=true
      # Update with your Okta values
      - OIDC_ISSUER=https://dev-12345678.okta.com/oauth2/default
      - OIDC_AUDIENCE=0oa1abc2def3ghi4jkl5
      - OAUTH_TOKEN_URL=https://dev-12345678.okta.com/oauth2/default/v1/token
      - OAUTH_AUTHORIZATION_URL=https://dev-12345678.okta.com/oauth2/default/v1/authorize
      - BASE_URL=https://seed.yourdomain.com

For sensitive values, use Docker secrets instead of environment variables:

bash
# Create secrets
echo "0oa1abc2def3ghi4jkl5" | docker secret create seed_oidc_audience -
echo "https://dev-12345678.okta.com/oauth2/default" | docker secret create seed_oidc_issuer -

# Update docker-stack.yml to use secrets
services:
  seed:
    secrets:
      - seed_oidc_audience
      - seed_oidc_issuer
    environment:
      - OIDC_AUDIENCE_FILE=/run/secrets/seed_oidc_audience
      - OIDC_ISSUER_FILE=/run/secrets/seed_oidc_issuer

secrets:
  seed_oidc_audience:
    external: true
  seed_oidc_issuer:
    external: true

WARNING

The current implementation doesn't support _FILE suffix for secrets. Use environment variables for now or extend the config loader.

Step 3: Build and Push Docker Image

bash
# Build image
docker build -t seed-mcp-server:latest .

# Tag for your registry
docker tag seed-mcp-server:latest containers.home/mcp-servers/seed:latest

# Push to registry
docker push containers.home/mcp-servers/seed:latest

Step 4: Deploy to Docker Swarm

bash
# Deploy the stack
docker stack deploy -c docker-stack.production.yml seed

# Verify deployment
docker stack ps seed

# Check logs
docker service logs seed_seed -f

Step 5: Verify Deployment

bash
# Check service status
docker service ps seed_seed

# Test health endpoint (no auth required)
curl https://seed.yourdomain.com/health

# Test MCP endpoint (requires auth)
curl https://seed.yourdomain.com/mcp \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Securing the Metrics Endpoint

The /metrics endpoint exposes Prometheus metrics and should be secured in production.

Restrict access to /metrics at the Traefik level using IP whitelisting:

yaml
services:
  seed:
    deploy:
      labels:
        # Main application router
        - traefik.http.routers.seed.rule=Host(`seed.yourdomain.com`)
        - traefik.http.routers.seed.entrypoints=websecure
        - traefik.http.routers.seed.tls.certresolver=letsencrypt
        - traefik.http.services.seed.loadbalancer.server.port=3000

        # Separate router for metrics with IP whitelist
        - traefik.http.routers.seed-metrics.rule=Host(`seed.yourdomain.com`) && PathPrefix(`/metrics`)
        - traefik.http.routers.seed-metrics.entrypoints=websecure
        - traefik.http.routers.seed-metrics.tls.certresolver=letsencrypt
        - traefik.http.routers.seed-metrics.priority=100
        # Only allow Prometheus server and admin IPs
        - traefik.http.routers.seed-metrics.middlewares=metrics-ipwhitelist
        - traefik.http.middlewares.metrics-ipwhitelist.ipwhitelist.sourcerange=10.0.0.0/8,192.168.1.100/32

Option 2: Internal Network Only

Deploy metrics endpoint on a separate internal network:

yaml
services:
  seed:
    networks:
      - traefik-public  # Public traffic
      - seed-internal   # Internal traffic (Redis, metrics)
    deploy:
      labels:
        # Only expose main app publicly
        - traefik.http.routers.seed.rule=Host(`seed.yourdomain.com`) && !PathPrefix(`/metrics`)

  prometheus:
    image: prom/prometheus:latest
    networks:
      - seed-internal
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    configs:
      - source: prometheus-config
        target: /etc/prometheus/prometheus.yml

configs:
  prometheus-config:
    file: ./prometheus.yml

networks:
  seed-internal:
    driver: overlay
    internal: true  # No external access

Create prometheus.yml:

yaml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'seed'
    static_configs:
      - targets: ['seed:3000']
    metrics_path: '/metrics'

Option 3: Disable Metrics in Production

If you don't need metrics, disable them entirely:

yaml
services:
  seed:
    environment:
      - METRICS_ENABLED=false

Option 4: Authentication via Traefik BasicAuth

Add HTTP Basic Auth to the metrics endpoint:

bash
# Generate password hash
htpasswd -nb admin your-password
# Output: admin:$apr1$ruca84Hq$mbjdMZBAG.KWn7vfN/SNK/

# Create Traefik middleware
docker config create metrics-auth-users -
# Paste the htpasswd output, then Ctrl+D

Update stack file:

yaml
services:
  seed:
    deploy:
      labels:
        # Metrics router with BasicAuth
        - traefik.http.routers.seed-metrics.rule=Host(`seed.yourdomain.com`) && PathPrefix(`/metrics`)
        - traefik.http.routers.seed-metrics.middlewares=metrics-auth
        - traefik.http.middlewares.metrics-auth.basicauth.usersfile=/run/secrets/metrics-auth-users
    configs:
      - source: metrics-auth-users
        target: /run/secrets/metrics-auth-users

configs:
  metrics-auth-users:
    external: true

Monitoring and Observability

Prometheus Setup

  1. Add Seed as Prometheus Target

    Edit your Prometheus configuration:

    yaml
    scrape_configs:
      - job_name: 'seed-mcp-server'
        static_configs:
          - targets: ['seed:3000']  # Internal service name
        metrics_path: '/metrics'
        scrape_interval: 30s
  2. Verify Scraping

    Check Prometheus UI → Targets to ensure Seed is being scraped successfully.

Grafana Dashboard

Create a Grafana dashboard to visualize Seed metrics:

Key Metrics to Monitor:

  1. HTTP Metrics

    • http_request_duration_seconds - Request latency
    • http_request_total - Request count by method/route/status
  2. MCP Session Metrics

    • mcp_sessions_active - Current active sessions
    • mcp_sessions_total - Total sessions created/terminated
    • mcp_tool_invocations_total - Tool usage by tool/status
    • mcp_tool_duration_seconds - Tool execution time
  3. Authentication Metrics

    • auth_attempts_total - Auth success/failure rate
    • auth_token_validation_duration_seconds - Token validation latency
  4. OAuth Flow Metrics (✅ Added 2026-01-07)

    • oauth_authorization_requests_total - OAuth authorization requests by result
    • oauth_token_exchanges_total - Token exchanges by grant type and result
    • oauth_token_exchange_duration_seconds - IdP response time
    • dcr_registrations_total - Dynamic client registrations
  5. Token Refresh Metrics (✅ Added 2026-01-07)

    • token_refresh_attempts_total - Token refresh attempts by type (proactive/reactive) and result
    • token_refresh_duration_seconds - Token refresh operation latency
    • pending_tokens_claimed_total - Pending tokens claimed by sessions
  6. JWKS Metrics

    • jwks_refresh_total - JWKS refresh operations
    • jwks_cache_hits_total / jwks_cache_misses_total - Cache efficiency
  7. Redis Metrics

    • redis_operations_total - Redis operation count
    • redis_operation_duration_seconds - Redis latency
    • circuit_breaker_state - Circuit breaker state (0=closed, 2=open)
    • circuit_breaker_failures_total - Circuit breaker failures
  8. Rate Limiting

    • rate_limit_hits_total - Requests blocked by rate limiting
  9. System Metrics (from prom-client defaults)

    • process_cpu_seconds_total - CPU usage
    • process_resident_memory_bytes - Memory usage
    • nodejs_eventloop_lag_seconds - Event loop lag

Alerting Rules

Example Prometheus alerting rules:

yaml
# prometheus-rules.yml
groups:
  - name: seed_mcp_server
    interval: 30s
    rules:
      # High error rate
      - alert: HighHTTPErrorRate
        expr: |
          (
            sum(rate(http_request_total{status_code=~"5.."}[5m]))
            /
            sum(rate(http_request_total[5m]))
          ) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High HTTP error rate on Seed MCP Server"
          description: "{{ $value | humanizePercentage }} of requests are failing"

      # High auth failure rate
      - alert: HighAuthFailureRate
        expr: |
          (
            sum(rate(auth_attempts_total{result="failure"}[5m]))
            /
            sum(rate(auth_attempts_total[5m]))
          ) > 0.10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High authentication failure rate"
          description: "{{ $value | humanizePercentage }} of auth attempts are failing"

      # Service down
      - alert: SeedMCPServerDown
        expr: up{job="seed-mcp-server"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Seed MCP Server is down"
          description: "Seed MCP Server has been down for more than 1 minute"

      # High memory usage
      - alert: HighMemoryUsage
        expr: |
          (
            process_resident_memory_bytes{job="seed-mcp-server"}
            /
            256000000  # 256MB limit from docker-stack.yml
          ) > 0.90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Seed MCP Server using too much memory"
          description: "Memory usage is at {{ $value | humanizePercentage }}"

      # JWKS refresh failures
      - alert: JWKSRefreshFailures
        expr: rate(jwks_refresh_total{result="failure"}[5m]) > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "JWKS refresh failures detected"
          description: "Unable to refresh JWKS keys from OIDC provider"

      # OAuth token exchange failures (Added 2026-01-07)
      - alert: HighOAuthTokenExchangeFailureRate
        expr: |
          (
            sum(rate(oauth_token_exchanges_total{result="failure"}[5m]))
            /
            sum(rate(oauth_token_exchanges_total[5m]))
          ) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High OAuth token exchange failure rate"
          description: "{{ $value | humanizePercentage }} of token exchanges are failing"

      # Token refresh failures (Added 2026-01-07)
      - alert: HighTokenRefreshFailureRate
        expr: |
          (
            sum(rate(token_refresh_attempts_total{result="failure"}[5m]))
            /
            sum(rate(token_refresh_attempts_total{result!="skipped"}[5m]))
          ) > 0.10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High token refresh failure rate"
          description: "{{ $value | humanizePercentage }} of token refreshes are failing"

      # Circuit breaker open (Added 2026-01-06)
      - alert: RedisCircuitBreakerOpen
        expr: circuit_breaker_state{name="redis"} == 2
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Redis circuit breaker is open"
          description: "Redis connection failures detected, circuit breaker protecting service"

Logging

The production stack uses JSON log format with rotation:

yaml
services:
  seed:
    environment:
      - LOG_FORMAT=json  # Structured logging for log aggregation
      - LOG_LEVEL=info   # Production log level
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Send logs to centralized logging:

yaml
# Example: Using Loki
services:
  seed:
    logging:
      driver: loki
      options:
        loki-url: "http://loki:3100/loki/api/v1/push"
        labels: "service=seed,environment=production"

Troubleshooting

Issue: Server fails to start

Symptoms:

bash
docker service logs seed_seed
# Configuration validation failed: ...
# Server startup aborted.

Solution: This is the configuration validation feature (implemented 2026-01-06) detecting invalid configuration.

  1. Check the error message - It will specify which configuration value is invalid:

    ✗ Configuration validation failed:
      - OIDC_ISSUER must be configured when AUTH_REQUIRED=true
      - PORT must be between 1 and 65535, got: 99999
  2. Fix the configuration in your stack file or environment variables

  3. Redeploy the service:

    bash
    docker stack deploy -c docker-stack.production.yml seed

Common validation errors:

  • Missing OIDC_ISSUER when AUTH_REQUIRED=true
  • Invalid URL formats (must be valid HTTP/HTTPS URLs)
  • Invalid port range (must be 1-65535)
  • TTL values too short (MCP_SESSION_TTL_SECONDS must be ≥60)

Issue: Cannot connect to /mcp endpoint

Symptoms:

bash
curl https://seed.yourdomain.com/mcp
# Returns 401 Unauthorized

Solutions:

  1. Check health endpoint first:

    bash
    # Verify server is running
    curl https://seed.yourdomain.com/health
    
    # Check dependencies are healthy
    curl https://seed.yourdomain.com/health/ready
  2. Check Okta configuration:

    bash
    # Verify OIDC discovery works
    curl https://dev-12345678.okta.com/oauth2/default/.well-known/openid-configuration
  3. Check environment variables:

    bash
    docker service inspect seed_seed --format='{{json .Spec.TaskTemplate.ContainerSpec.Env}}'
  4. Check service logs:

    bash
    docker service logs seed_seed -f | grep -i error
  5. Verify JWKS is accessible:

    bash
    # Check if Seed can reach Okta
    docker exec $(docker ps -qf label=com.docker.swarm.service.name=seed_seed) \
      wget -O- https://dev-12345678.okta.com/oauth2/default/v1/keys

Issue: /metrics endpoint returning 404

Symptoms:

bash
curl https://seed.yourdomain.com/metrics
# Returns 404 Not Found

Solutions:

  1. Check if metrics are enabled:

    bash
    docker service inspect seed_seed | grep METRICS_ENABLED
    # Should not be set to "false"
  2. Check Traefik routing:

    bash
    # Verify Traefik can see the service
    curl -u admin:password http://traefik:8080/api/http/routers
  3. Test directly from container:

    bash
    docker exec $(docker ps -qf label=com.docker.swarm.service.name=seed_seed) \
      wget -qO- http://localhost:3000/metrics

Issue: High memory usage

Symptoms:

  • Container OOMKilled
  • mcp_sessions_active metric growing unbounded

Solutions:

  1. Check session TTL:

    bash
    # Verify MCP_SESSION_TTL_SECONDS is set
    docker service inspect seed_seed | grep MCP_SESSION_TTL
  2. Check Redis eviction:

    bash
    docker exec $(docker ps -qf label=com.docker.swarm.service.name=seed_redis) \
      redis-cli CONFIG GET maxmemory-policy
    # Should be: allkeys-lru
  3. Monitor session metrics:

    text
    # Check if sessions are expiring
    rate(mcp_sessions_total{status="terminated"}[5m])
  4. Increase memory limit:

    yaml
    services:
      seed:
        deploy:
          resources:
            limits:
              memory: 512M  # Increased from 256M

Issue: Rate limiting false positives

Symptoms:

  • Legitimate requests getting 429 responses
  • rate_limit_hits_total metric increasing

Solutions:

  1. Increase rate limits:

    yaml
    environment:
      - MCP_RATE_LIMIT_MAX=200  # Increased from 100
      - MCP_RATE_LIMIT_WINDOW_MS=60000
  2. Check if rate limiting is per-IP:

    bash
    # All requests from same IP (e.g., Traefik)?
    docker service logs seed_seed | grep "rate limit"
  3. Disable rate limiting temporarily:

    yaml
    environment:
      - RATE_LIMIT_ENABLED=false

Issue: CORS errors in browser

Symptoms:

Access to fetch at 'https://seed.yourdomain.com/mcp' from origin 'https://app.example.com'
has been blocked by CORS policy

Solutions:

  1. Add origin to CORS whitelist:

    yaml
    environment:
      - CORS_EXTRA_ORIGINS=https://app.example.com,https://other.example.com
  2. Check current CORS config:

    bash
    docker service logs seed_seed | grep -i cors

Production Readiness Features

Implemented (2026-01-06):

Graceful Shutdown

  • SIGTERM and SIGINT signal handling
  • Stops accepting new connections
  • Waits for active requests (5-second grace period)
  • Closes all MCP sessions in parallel
  • Closes Redis connections properly
  • Stops JWKS refresh timer
  • Health check returns 503 during shutdown

Kubernetes integration: No special configuration needed - graceful shutdown works automatically with Kubernetes pod termination.

Health Checks

  • Liveness probe (/health) - Process health, returns 503 during shutdown
  • Readiness probe (/health/ready) - Dependency health checks:
    • Redis connectivity with circuit breaker state
    • JWKS cache with expiration tracking
    • Session capacity with utilization metrics

Kubernetes deployment:

yaml
livenessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

Configuration Validation

  • Validates all config values at startup
  • URL format validation (HTTP/HTTPS, redis://)
  • Port ranges (1-65535)
  • Numeric limits and TTL values
  • Production-specific security requirements
  • Exits with clear error messages if validation fails

Redis Resilience

  • Circuit breaker pattern for connection failures
  • Graceful degradation when Redis unavailable
  • Automatic reconnection with exponential backoff
  • Health check integration

Token Revocation

  • RFC 7009 compliant /oauth/revoke endpoint
  • Access token revocation cache (5-minute TTL)
  • Refresh token revocation proxied to IdP

Security Checklist

Before deploying to production:

  • [ ] HTTPS enabled with valid TLS certificate (Let's Encrypt)
  • [ ] AUTH_REQUIRED=true set
  • [ ] Okta application configured with PKCE required
  • [ ] Redirect URIs properly configured in Okta
  • [ ] /metrics endpoint secured (IP whitelist or disabled)
  • [ ] Docker secrets used for sensitive values (optional but recommended)
  • [ ] Redis configured with password (if exposed)
  • [ ] Rate limiting enabled with appropriate limits
  • [ ] Logging configured with rotation
  • [ ] Monitoring and alerting set up
  • [ ] Resource limits set on containers
  • [ ] ✅ Health checks configured (liveness and readiness probes)
  • [ ] ✅ Configuration validation enabled (automatic at startup)
  • [ ] Backup strategy for Redis data (if needed)

Released under the MIT License.