Token Refresh Metrics
Status: ✅ IMPLEMENTED Priority: 🟢 MEDIUM (Completed) Actual Time: 2-3 hours Implementation Date: 2026-01-07 Risk Level: LOW Impact: Enhanced observability for OAuth token refresh operations
Implementation Summary
Token refresh metrics have been successfully implemented to provide comprehensive visibility into proactive token refresh operations and pending token claims.
What Was Implemented
- Token Refresh Attempts Counter - Tracks all refresh attempts by type (proactive/reactive) and result (success/failure/skipped) in
src/services/metrics.ts - Token Refresh Duration Histogram - Measures refresh operation latency with buckets [0.1, 0.5, 1, 2, 5] seconds
- Pending Tokens Claimed Counter - Tracks when pending tokens are successfully claimed during MCP session creation
- Metrics Integration - Instrumented
src/middleware/auth.tsandsrc/mcp/mcp.ts - Comprehensive Tests - Added 6 test cases in
src/services/metrics.test.ts - Grafana Dashboard - Added 3 new panels to
grafana/seed-mcp-server-dashboard.json
Key Files Modified
src/services/metrics.ts- Metric definitionssrc/middleware/auth.ts- Refresh operation instrumentationsrc/mcp/mcp.ts- Pending token claim trackingsrc/services/metrics.test.ts- Test coveragesrc/middleware/auth.test.ts- Updated metrics mockgrafana/seed-mcp-server-dashboard.json- Dashboard panels
Testing
- 6 comprehensive test cases covering all metrics
- Mock validation for proper metric instrumentation
- Test coverage: >90% for metrics service
- All tests passing via
npm run validate
Original Problem Statement
Token refresh operations are logged but not exposed as Prometheus metrics. This limits operational visibility:
- Cannot track refresh success/failure rates
- No visibility into proactive vs reactive refresh patterns
- Missing duration metrics for IdP response times
- No alerting on refresh failures
Proposed Solution
Add comprehensive Prometheus metrics for token refresh operations.
Metrics to Add
typescript
// src/services/metrics.ts
export const tokenRefreshAttempts = new Counter({
name: 'token_refresh_attempts_total',
help: 'Total token refresh attempts',
labelNames: ['type', 'result'], // proactive/reactive, success/failure
registers: [register],
});
export const tokenRefreshDuration = new Histogram({
name: 'token_refresh_duration_seconds',
help: 'Token refresh operation duration',
labelNames: ['result'],
buckets: [0.1, 0.5, 1, 2, 5],
registers: [register],
});
export const pendingTokensClaimed = new Counter({
name: 'pending_tokens_claimed_total',
help: 'Total pending tokens claimed by sessions',
registers: [register],
});
export const tokenStoreOperations = new Counter({
name: 'token_store_operations_total',
help: 'Token store operations',
labelNames: ['operation', 'result'], // get/set/delete, success/failure
registers: [register],
});Usage Example
typescript
// In attemptTokenRefresh
const timer = tokenRefreshDuration.startTimer();
try {
const newToken = await refreshTokenFromIdP();
tokenRefreshAttempts.inc({ type: 'proactive', result: 'success' });
timer({ result: 'success' });
return newToken;
} catch (error) {
tokenRefreshAttempts.inc({ type: 'proactive', result: 'failure' });
timer({ result: 'failure' });
throw error;
}Sample Queries
promql
# Refresh success rate
rate(token_refresh_attempts_total{result="success"}[5m])
/ rate(token_refresh_attempts_total[5m])
# P99 refresh latency
histogram_quantile(0.99,
rate(token_refresh_duration_seconds_bucket[5m])
)
# Proactive vs reactive ratio
rate(token_refresh_attempts_total{type="proactive"}[5m])
/ rate(token_refresh_attempts_total[5m])Acceptance Criteria
- [x] Token refresh attempt counter with type and result labels
- [x] Token refresh duration histogram
- [x] Pending token claim counter
- [x] Metrics integrated into existing refresh logic
- [x] Comprehensive test coverage (>90%)
- [x] Documentation with example queries
- [x] Grafana dashboard panels
Estimated Effort
2-3 hours - Add metrics, integrate into code, documentation
Related Enhancements
- Automatic Token Refresh - Base implementation
- OAuth Flow Metrics - Complete OAuth observability