Monitoring Claude Code with OpenTelemetry and Grafana
You’re using Claude Code daily. Sessions add up. Tokens accumulate. PRs get created. But how much are you actually using it? What’s the ROI? Which models cost the most? How productive is your team with AI assistance?
Without monitoring, these questions stay unanswered. You’re flying blind on usage patterns, costs, and productivity gains.
Claude Code ships with OpenTelemetry support. You can track sessions, token consumption, costs, commits, lines of code changed, and active time—all with standard observability tools.
This tutorial shows you how to set up a complete monitoring stack using Docker: OpenTelemetry Collector, Prometheus, and Grafana. By the end, you’ll have dashboards tracking every aspect of your Claude Code usage.
What You’ll Monitor
Claude Code’s OpenTelemetry implementation provides two types of data:
Metrics
Usage Statistics:
claude_code.session.count- CLI sessions startedclaude_code.commit.count- Git commits createdclaude_code.pull_request.count- PRs createdclaude_code.lines_of_code.count- Lines modified (added/removed)
Token and Cost Tracking:
claude_code.token.usage- Tokens consumed (input/output/cache read/cache creation)claude_code.cost.usage- Session cost in USD
Productivity Metrics:
claude_code.active_time.total- Active usage time in secondsclaude_code.code_edit_tool.decision- Tool permission decisions
Standard Attributes (attached to all metrics):
session.id- Unique session identifierapp.version- Claude Code versionuser.account_uuid- Account UUIDorganization.id- Organization UUID (when authenticated)terminal.type- Terminal application (iTerm.app, vscode, cursor, tmux, etc.)
Events
User Prompts (claude_code.user_prompt):
- Prompt length and content (content redacted by default)
Tool Results (claude_code.tool_result):
- Tool name, success status, duration, errors
- Decision (accept/reject) and source (config, user_permanent, user_temporary)
- Tool-specific parameters
API Interactions (claude_code.api_request and claude_code.api_error):
- Model used, tokens consumed, cost, duration
- Error messages and status codes for failures
Tool Decisions (claude_code.tool_decision):
- Which tools were accepted or rejected
- Records tool name, decision, and source
Privacy and Security
Claude Code’s telemetry is opt-in and privacy-focused:
- Explicit enablement required - Disabled by default
- No sensitive data - API keys and file contents never included
- Prompt redaction - User prompts redacted by default (enable with
OTEL_LOG_USER_PROMPTS=1) - Cardinality control - Configure which attributes to include
Architecture Overview
The monitoring stack has four components:
Claude Code → OpenTelemetry Collector → Prometheus → Grafana
Claude Code: Exports metrics and events via OTLP (OpenTelemetry Protocol)
OpenTelemetry Collector: Receives OTLP data, processes it, and exposes a Prometheus-compatible endpoint. Supports both gRPC (port 4317) and HTTP (port 4318) protocols.
Prometheus: Time-series database that scrapes metrics from the collector. Stores data for queries and historical analysis. Default retention: 90 days.
Grafana: Visualization platform for dashboards, charts, and alerts.
Why not connect Claude Code directly to Prometheus? Prometheus uses a pull model (scraping), while Claude Code uses a push model (OTLP). The OpenTelemetry Collector bridges this gap, converting OTLP to Prometheus’s format.
Prerequisites
Before starting, ensure you have:
| Requirement | Details |
|---|---|
| Docker | Version 20.10+ |
| Docker Compose | Version 2.0+ |
| Available Ports | 3000, 4317, 4318, 8889, 9090, 13133, 55679 |
| Disk Space | ~1GB for 90 days of retention |
Verify Docker versions:
docker --version # Should show 20.10+
docker compose version # Should show 2.0+
Setup Part 1: Directory Structure
Create the project directory and required subdirectories:
# Create main project folder
mkdir claude-code-metrics-stack && cd claude-code-metrics-stack
# Create configuration directories
mkdir -p config/grafana/provisioning/datasources
# Create data directories for persistence
mkdir -p data/prometheus data/grafana
# Set permissions (Grafana and Prometheus run as specific users)
chmod -R 777 data/
Your directory structure should look like this:
claude-code-metrics-stack/
├── docker-compose.yml (to be created)
├── config/
│ ├── otel-collector-config.yaml (to be created)
│ ├── prometheus.yml (to be created)
│ └── grafana/
│ └── provisioning/
│ └── datasources/
│ └── datasources.yml (to be created)
└── data/
├── prometheus/ (volume mount)
└── grafana/ (volume mount)
Setup Part 2: OpenTelemetry Collector Configuration
Create the collector configuration file:
cat > config/otel-collector-config.yaml <<'EOF'
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins:
- "*"
processors:
batch:
timeout: 10s
send_batch_size: 1024
extensions:
zpages:
endpoint: 0.0.0.0:55679
health_check:
endpoint: 0.0.0.0:13133
exporters:
prometheus:
endpoint: 0.0.0.0:8889
const_labels:
source: otel-collector
debug:
verbosity: detailed
service:
extensions: [zpages, health_check]
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus, debug]
EOF
What this does:
- Receivers: Accept OTLP data via gRPC (port 4317) and HTTP (port 4318) with CORS enabled
- Processors: Batch metrics (10-second timeout, max 1024 per batch) for efficient processing
- Extensions:
zpages: Debugging interface on port 55679health_check: Health check endpoint on port 13133
- Exporters:
prometheus: Expose metrics on port 8889 for Prometheus to scrape with source labeldebug: Log detailed information for troubleshooting
- Service: Enables extensions and routes metrics through receivers → processors → exporters
Setup Part 3: Prometheus Configuration
Create the Prometheus configuration:
cat > config/prometheus.yml <<'EOF'
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
rule_files: []
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
labels:
app: "prometheus"
- job_name: "otel-collector"
static_configs:
- targets: ["otel-collector:8889"]
labels:
app: "otel-collector"
source: "claude-code-metrics"
scrape_interval: 10s
scrape_timeout: 5s
EOF
Configuration details:
- Global interval: 15 seconds between scrapes
- Evaluation interval: 15 seconds for rule evaluation
- Alerting: Empty alertmanagers config (can be configured later)
- Rule files: Empty (can add alerting rules later)
- Scrape configs:
- Prometheus self-monitoring on localhost:9090
- OpenTelemetry Collector on otel-collector:8889 with custom labels
- 10-second scrape interval with 5-second timeout for collector
Retention note: Prometheus defaults to 15 days. The docker-compose.yml (next section) configures 90-day retention, balancing historical insight with disk efficiency. For longer retention, adjust to --storage.tsdb.retention.time=365d and add --storage.tsdb.retention.size=50GB to cap disk usage.
Setup Part 4: Grafana Datasource Configuration
Create the Grafana datasource provisioning file:
cat > config/grafana/provisioning/datasources/datasources.yml <<'EOF'
apiVersion: 1
prune: false
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
uid: prometheus_claude_metrics
url: http://prometheus:9090
basicAuth: false
editable: false
isDefault: true
jsonData:
timeInterval: "10s"
httpMethod: "POST"
EOF
What this does:
- Auto-provisions Prometheus as a datasource when Grafana starts
- Prune: Disabled to prevent auto-removal of manually added datasources
- URL:
http://prometheus:9090(Docker service name) - Access mode:
proxy(Grafana server queries Prometheus, not the browser) - UID: Unique identifier
prometheus_claude_metricsfor dashboard references - Editable: Set to false to prevent accidental modification
- Default: Makes Prometheus the default datasource for new dashboards
- Time interval: Minimum query interval of 10 seconds
- HTTP method: Uses POST for queries (better for large queries)
Setup Part 5: Docker Compose
Create the Docker Compose file that orchestrates all services:
cat > docker-compose.yml <<'EOF'
version: '3.8'
services:
otel-collector:
image: otel/opentelemetry-collector:0.99.0
container_name: otel-collector
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./config/otel-collector-config.yaml:/etc/otel-collector-config.yaml:ro
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus scrape endpoint
- "55679:55679" # zPages
- "13133:13133" # Health check
restart: unless-stopped
networks:
- claude-metrics-network
prometheus:
image: prom/prometheus:v3.8.0
container_name: prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=90d"
- "--web.console.libraries=/usr/share/prometheus/console_libraries"
- "--web.console.templates=/usr/share/prometheus/consoles"
- "--web.enable-lifecycle"
- "--web.enable-remote-write-receiver"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./data/prometheus:/prometheus
ports:
- "9090:9090"
restart: unless-stopped
depends_on:
otel-collector:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
interval: 10s
timeout: 5s
retries: 3
networks:
- claude-metrics-network
grafana:
image: grafana/grafana:12.3.0
container_name: grafana
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=http://localhost:3000
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
volumes:
- ./config/grafana/provisioning:/etc/grafana/provisioning:ro
- ./data/grafana:/var/lib/grafana
ports:
- "3000:3000"
restart: unless-stopped
depends_on:
prometheus:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health"]
interval: 10s
timeout: 5s
retries: 3
networks:
- claude-metrics-network
networks:
claude-metrics-network:
driver: bridge
name: claude-metrics-network
EOF
Key configuration details:
OpenTelemetry Collector (v0.99.0):
- Listens on ports 4317 (gRPC) and 4318 (HTTP) for Claude Code metrics
- Exposes Prometheus scrape endpoint on 8889
- zPages debugging interface on 55679
- Health check endpoint on 13133
- Read-only config mount for security
Prometheus (v3.8.0):
- 90-day data retention (
--storage.tsdb.retention.time=90d) - Persistent storage via
./data/prometheusbind mount - Lifecycle API enabled for configuration reloads
- Remote write receiver enabled for additional data sources
- Read-only config mount
- Health checks ensure readiness before Grafana starts
Grafana (v12.3.0):
- Default credentials:
admin/admin(change immediately for non-local deployments) - Auto-installs clock and pie chart panels
- Auto-provisions Prometheus datasource via read-only config
- Persistent storage for dashboards and settings
- Health check on
/api/healthendpoint
Networking:
- Dedicated bridge network
claude-metrics-networkisolates stack - Services communicate via container names (otel-collector, prometheus, grafana)
Dependencies: Services start in order (collector → Prometheus → Grafana) with health checks ensuring readiness.
Setup Part 6: Launch the Stack
Start all services:
# Launch containers in detached mode
docker compose up -d
# Watch logs from all services
docker compose logs -f
Verify services are running:
docker compose ps
You should see three containers with “Up” status.
Test service health:
- OpenTelemetry Collector health:
curl http://localhost:13133 - Prometheus UI: Open
http://localhost:9090in your browser - Grafana UI: Open
http://localhost:3000(login: admin/admin)
Check logs for errors:
# Collector logs
docker compose logs otel-collector
# Prometheus logs
docker compose logs prometheus
# Grafana logs
docker compose logs grafana
Enable Claude Code Telemetry
Now that the monitoring stack is running, configure Claude Code to send metrics.
Set environment variables in your shell profile (~/.bashrc, ~/.zshrc, etc.):
# Enable telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1
# Configure OTLP exporters
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
# Set protocol and endpoint
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
Why these settings:
CLAUDE_CODE_ENABLE_TELEMETRY=1: Explicitly enables telemetry (opt-in)OTEL_METRICS_EXPORTER=otlp: Use OpenTelemetry Protocol for metricsOTEL_LOGS_EXPORTER=otlp: Use OTLP for events/logsOTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf: Use HTTP protocol (port 4318)OTEL_EXPORTER_OTLP_ENDPOINT: Collector’s HTTP endpoint
Alternative: gRPC protocol
If you prefer gRPC (port 4317):
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Apply configuration:
# Reload shell configuration
source ~/.bashrc # or source ~/.zshrc
# Verify variables are set
env | grep OTEL
env | grep CLAUDE_CODE_ENABLE_TELEMETRY
Important: Environment variables are only read when Claude Code starts. Restart any running Claude Code sessions for changes to take effect.
Export intervals:
- Metrics: Exported every 60 seconds (default)
- Events/Logs: Exported every 5 seconds (default)
To customize:
export OTEL_METRIC_EXPORT_INTERVAL=30000 # 30 seconds
export OTEL_LOGS_EXPORT_INTERVAL=10000 # 10 seconds
Cardinality control (optional):
Control which attributes are included to manage cardinality:
# Include session ID (default: true)
export OTEL_METRICS_INCLUDE_SESSION_ID=true
# Include app version (default: false)
export OTEL_METRICS_INCLUDE_VERSION=false
# Include account UUID (default: true)
export OTEL_METRICS_INCLUDE_ACCOUNT_UUID=true
Custom resource attributes (optional):
Add custom labels for filtering and grouping:
export OTEL_RESOURCE_ATTRIBUTES="department=engineering,team.id=platform,cost_center=eng-123"
Note: Follows W3C Baggage spec—no spaces in values, use percent-encoding for special characters.
Import Grafana Dashboard
With telemetry enabled and data flowing, set up the Grafana dashboard.
Access Grafana
- Open
http://localhost:3000in your browser - Login with default credentials:
admin/admin - Change the password when prompted (or skip for local testing)
Import Dashboard from JSON
A community member created a comprehensive Claude Code dashboard. Import it:
-
Get the dashboard JSON: Visit https://gist.github.com/mikelane/f6c3a175cd9f92410aba06b5ac24ba54
-
Copy the raw JSON from the gist
-
In Grafana:
- Click Dashboards in left sidebar
- Click New → Import
- Paste the JSON in the “Import via panel json” box
- Click Load
- Select Prometheus as the datasource
- Click Import
Dashboard Panels
The imported dashboard includes these visualizations:
Usage Statistics:
- Total sessions tracked
- Total commits made via Claude Code
- Lines of code accepted from suggestions
Token Analysis:
- Input tokens consumed
- Output tokens consumed
- Cache read tokens (with notation: “saves money!”)
- Cache creation tokens
- Token distribution by type (pie chart)
- Token usage by model over time
Cost Monitoring:
- Total API expenditure in USD
- Cost per 1,000 output tokens
- Cost breakdown by model (bar gauge)
- Cost accumulation trends (time-series)
Productivity Metrics:
- Active time for CLI vs. user interaction (time-series)
- Productivity Ratio (CLI time divided by user time)
- Peak leverage measurement
- Activity patterns comparing CLI and user engagement
Visualizations:
- Time-series charts with 5-minute intervals
- Pie charts for distribution analysis
- Bar gauges for comparative metrics
- Cache efficiency percentage gauge
Dashboard settings:
- Default timeframe: 1 hour
- Auto-refresh enabled
- Tags: claude, productivity, ai, development
Wait for Data
After starting your first Claude Code session with telemetry enabled, wait 60-90 seconds for initial data export. Refresh the dashboard to see metrics appear.
Understanding the Metrics
Here’s what each metric tells you:
Session Metrics
claude_code.session.count
- What it measures: Number of CLI sessions started
- Why it matters: Track adoption and usage frequency
- Example query:
rate(claude_code_session_count_total[5m])- Sessions per second over 5 minutes
Token Metrics
claude_code.token.usage
- What it measures: Tokens consumed (input, output, cache read, cache creation)
- Why it matters: Understand API consumption and identify optimization opportunities
- Attributes: Model, token type (input/output/cacheRead/cacheCreation)
- Cache savings note: Cache read tokens are cheaper than input tokens—high cache read percentages reduce costs
Token types:
- Input tokens: User prompts and context sent to the API
- Output tokens: Model responses (most expensive)
- Cache read tokens: Reused cached context (saves money!)
- Cache creation tokens: Initial cache population
Cost Metrics
claude_code.cost.usage
- What it measures: Estimated session cost in USD
- Why it matters: Budget tracking and ROI calculation
- Attributes: Model (different models have different pricing)
- Dashboard calculation: Cost per 1,000 output tokens for efficiency comparison
Productivity Metrics
claude_code.lines_of_code.count
- What it measures: Lines added/removed via Claude Code
- Why it matters: Quantify output and code changes
- Use case: Calculate lines of code per session or per hour
claude_code.commit.count
- What it measures: Git commits created via Claude Code
- Why it matters: Track completed work units
- Combined metric: Commits per session = quality indicator
claude_code.pull_request.count
- What it measures: PRs created via Claude Code
- Why it matters: Measure feature/fix delivery rate
claude_code.active_time.total
- What it measures: Total active time in seconds
- Why it matters: Track actual usage time and calculate productivity metrics
- Dashboard usage: The Grafana dashboard uses this to calculate productivity ratios and compare activity patterns
Tool Decision Metrics
claude_code.code_edit_tool.decision
- What it measures: Code editing tool permission decisions (accept/reject)
- Why it matters: Understand friction points in workflow
- Attributes: Tool (Edit, Write, NotebookEdit), decision (accept/reject), language (programming language), source
Verification & Troubleshooting
Check Data is Flowing
1. Verify Claude Code is sending metrics:
Start a Claude Code session and run a simple task (e.g., ask a question). Wait 60 seconds for the export interval.
2. Check OpenTelemetry Collector logs:
docker compose logs otel-collector | grep -i metrics
You should see log entries about received metrics.
3. Query Prometheus directly:
Open http://localhost:9090, go to Graph, and query:
claude_code_session_count_total
If working, you’ll see data points. If empty, metrics aren’t reaching Prometheus.
4. Check Grafana:
Open your dashboard. If panels show “No data,” investigate using steps below.
Common Issues
Issue 1: “No Data” in Dashboard
Symptoms: Grafana panels display “No data” despite Claude Code sessions running.
Causes & Solutions:
-
Environment variables not loaded
- Check:
env | grep OTELin the terminal where you run Claude Code - Fix: Reload shell config (
source ~/.bashrc) and restart Claude Code
- Check:
-
Claude Code not restarted
- Issue: Environment variables are only read at launch
- Fix: Exit all Claude Code sessions and start fresh
-
Export interval not elapsed
- Wait: 60-90 seconds for initial metrics export
- Check: Run a longer Claude Code session (2+ minutes)
-
Wrong protocol or endpoint
- Check: Verify
OTEL_EXPORTER_OTLP_ENDPOINTmatches the protocol- HTTP:
http://localhost:4318 - gRPC:
http://localhost:4317
- HTTP:
- Test endpoint:
curl -v http://localhost:4318/v1/metrics
- Check: Verify
-
Collector not receiving data
- Check logs:
docker compose logs otel-collector | grep -i error - Common error: Protocol mismatch (gRPC endpoint with HTTP protocol)
- Check logs:
Issue 2: Connection Refused
Symptoms: “Connection refused” errors in Claude Code or collector logs.
Causes & Solutions:
-
Services not running
- Check:
docker compose psshows all services “Up” - Fix:
docker compose up -d
- Check:
-
Port conflicts
- Check:
netstat -an | grep 4318orlsof -i :4318 - Fix: Stop conflicting service or change port in docker-compose.yml
- Check:
-
Firewall blocking
- Check: Test with
curl http://localhost:4318 - Fix: Allow ports 3000, 4317, 4318, 8889, 9090 in firewall
- Check: Test with
-
Wrong endpoint in environment variables
- Inside Docker: Container names work (
http://otel-collector:4318) - Outside Docker (Claude Code): Use
localhost(http://localhost:4318)
- Inside Docker: Container names work (
Issue 3: Prometheus Not Scraping
Symptoms: Prometheus shows “No data” but collector logs show metrics received.
Causes & Solutions:
-
Scrape target down
- Check: Open
http://localhost:9090/targets - Look for:
otel-collectortarget status (should be “UP”) - Fix: If “DOWN,” check collector health:
curl http://localhost:13133
- Check: Open
-
Wrong target configuration
- Check:
config/prometheus.ymlhastargets: ["otel-collector:8889"] - Fix: Restart Prometheus:
docker compose restart prometheus
- Check:
-
Collector not exposing Prometheus endpoint
- Test:
curl http://localhost:8889/metrics - Fix: Verify
config/otel-collector-config.yamlhas prometheus exporter on 8889
- Test:
Issue 4: High Memory Usage
Symptoms: Prometheus or collector consuming excessive memory.
Causes & Solutions:
-
Too many metrics/high cardinality
- Reduce session IDs:
export OTEL_METRICS_INCLUDE_SESSION_ID=false - Disable version:
export OTEL_METRICS_INCLUDE_VERSION=false
- Reduce session IDs:
-
Long retention period
- Adjust: Change
--storage.tsdb.retention.time=90dto30din docker-compose.yml - Cap size: Add
--storage.tsdb.retention.size=50GB
- Adjust: Change
-
Batch size too large
- Edit:
config/otel-collector-config.yaml→ reducesend_batch_sizefrom 1024 to 512
- Edit:
Debugging Tips
Enable debug logging in collector:
Add to config/otel-collector-config.yaml under exporters:
exporters:
debug:
verbosity: detailed
Restart collector: docker compose restart otel-collector
Query raw Prometheus metrics:
# See all Claude Code metrics
curl http://localhost:9090/api/v1/label/__name__/values | grep claude_code
# Query specific metric
curl 'http://localhost:9090/api/v1/query?query=claude_code_session_count_total'
Test OTLP endpoint manually:
# HTTP endpoint
curl -v -X POST http://localhost:4318/v1/metrics \
-H "Content-Type: application/x-protobuf"
# Should return 200 or 400 (not connection refused)
Check Grafana datasource:
- Go to Configuration → Data Sources → Prometheus
- Click Save & Test
- Should show green “Data source is working”
Production Considerations
If deploying this stack beyond local development, consider these best practices:
Security
Change default credentials immediately:
Edit docker-compose.yml before first deployment:
grafana:
environment:
- GF_SECURITY_ADMIN_USER=your-admin-user
- GF_SECURITY_ADMIN_PASSWORD=your-secure-password
Use environment variables for secrets:
grafana:
environment:
- GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER}
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}
Create .env file (add to .gitignore):
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=secure-password-here
Secure endpoints:
- Don’t expose ports publicly without authentication
- Use reverse proxy (nginx, Traefik) with HTTPS
- Enable Grafana auth (LDAP, OAuth, SAML) for teams
Authentication for OTLP endpoint:
If deploying the collector remotely, add authentication headers:
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token-here"
Configure collector to validate tokens.
Data Persistence
Use named volumes instead of bind mounts for better Docker management:
volumes:
prometheus_data:
grafana_data:
services:
prometheus:
volumes:
- prometheus_data:/prometheus
grafana:
volumes:
- grafana_data:/var/lib/grafana
Backup Grafana dashboards:
# Export dashboard JSON
curl -u admin:password http://localhost:3000/api/dashboards/db/claude-code-metrics \
> dashboard-backup.json
Backup Prometheus data:
Enable admin API in docker-compose.yml:
prometheus:
command:
- '--web.enable-admin-api'
Create snapshot:
curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot?skip_head=false
Resource Limits
Adjust based on usage:
For teams (5-10 users):
otel-collector:
deploy:
resources:
limits:
cpus: '1.0'
memory: 1G
prometheus:
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
Query performance tuning:
Add to Prometheus command:
prometheus:
command:
- '--query.timeout=2m'
- '--query.max-concurrency=20'
Monitoring the Monitoring Stack
Add health check endpoints to your monitoring:
# Check all services
docker compose ps
# Memory and CPU usage
docker stats
# Prometheus health
curl http://localhost:9090/-/healthy
# Collector health
curl http://localhost:13133
Set up alerts for stack health (Prometheus down, disk full, high memory).
Enterprise Configuration
For organization-wide deployment, use administrator configuration via managed settings:
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer company-token"
}
}
Distribute via MDM with high precedence (users cannot override).
Dynamic authentication for token refresh:
{
"otelHeadersHelper": "/bin/generate_opentelemetry_headers.sh"
}
Script outputs valid JSON:
#!/bin/bash
echo "{\"Authorization\": \"Bearer $(get-token.sh)\", \"X-API-Key\": \"$(get-api-key.sh)\"}"
Limitation: Headers fetched at startup only. Use OpenTelemetry Collector as proxy for frequent refresh scenarios.
What You’ve Built
You now have a complete monitoring stack tracking:
- Usage patterns: Sessions, commits, PRs, lines of code
- Token consumption: Input, output, cache reads, cache creation
- Cost tracking: Total spend and per-model breakdown
- Productivity metrics: CLI time vs. user time, leverage ratios
- Tool decisions: What gets accepted or rejected
The stack runs entirely in Docker with persistent storage, health checks, and production-ready configurations.
Next steps:
- Set up alerts: Configure Grafana alerts for budget thresholds, unusual token spikes, or errors
- Customize dashboards: Add panels for team-specific metrics or custom attributes
- Analyze patterns: Use data to optimize workflows, model selection, and caching strategies
- Calculate ROI: Compare Claude Code costs against productivity gains (commits/hour, lines/hour)
Resources
Official Documentation:
Community Resources:
Additional Guides:
- Claude Code ROI Measurement Templates (Docker, Prometheus, reporting templates)
- Claude Code Monitoring on Amazon Bedrock
Questions or issues? Share feedback in the Claude Code GitHub discussions.