Real-Time Monitoring

Dokploy includes a built-in monitoring service written in Go that collects system and container metrics in real-time.

Overview

The monitoring service provides:

Server Metrics

CPU, memory, disk, and network statistics for your servers

Container Metrics

Resource usage for individual Docker containers and services

Alerting

Threshold-based notifications via webhook callbacks

Architecture

The monitoring stack consists of:

Go Service: Lightweight metrics collector (dokploy/monitoring)
SQLite Database: Local storage for metrics data
HTTP API: RESTful endpoints for querying metrics
Webhook Integration: Alert delivery to Dokploy control plane

Installation

Monitoring is automatically configured during server setup:

await setupMonitoring(serverId);

This deploys a monitoring container:

const settings: ContainerCreateOptions = {
  name: "dokploy-monitoring",
  Image: "dokploy/monitoring:latest",
  Env: [`METRICS_CONFIG=${JSON.stringify(metricsConfig)}`],
  HostConfig: {
    RestartPolicy: { Name: "always" },
    NetworkMode: "host",
    Binds: [
      "/var/run/docker.sock:/var/run/docker.sock:ro",
      "/sys:/host/sys:ro",
      "/proc:/host/proc:ro",
      "/etc/os-release:/etc/os-release:ro",
      "/etc/dokploy/monitoring/monitoring.db:/app/monitoring.db"
    ],
    PortBindings: {
      "3001/tcp": [{ HostPort: "3001" }]
    }
  }
};

The monitoring service uses host networking mode to accurately collect system metrics.

Configuration

Metrics Configuration

Configure monitoring via the server’s metricsConfig object:

{
  "server": {
    "type": "Remote",
    "refreshRate": 30,
    "retentionDays": 7,
    "port": 3001,
    "token": "secure-random-token",
    "urlCallback": "https://dokploy.example.com/api/trpc/notification.receiveNotification",
    "cronJob": "0 0 * * *",
    "thresholds": {
      "cpu": 80,
      "memory": 85
    }
  },
  "containers": {
    "refreshRate": 60,
    "services": {
      "include": ["my-app-service"],
      "exclude": ["temp-container"]
    }
  }
}

Configuration Options

Server Metrics
Container Metrics

Option	Type	Description
`type`	string	Server type: “Remote” or “Dokploy”
`refreshRate`	number	Collection interval in seconds
`retentionDays`	number	Days to retain metrics
`port`	number	HTTP API port (default: 3001)
`token`	string	Authentication token
`urlCallback`	string	Webhook URL for alerts
`cronJob`	string	Cleanup schedule (cron format)
`thresholds.cpu`	number	CPU alert threshold (0-100%)
`thresholds.memory`	number	Memory alert threshold (0-100%)

Option	Type	Description
`refreshRate`	number	Collection interval in seconds
`services.include`	string[]	Container names to monitor
`services.exclude`	string[]	Container names to exclude

Setting a threshold to 0 disables alerting for that metric.

Server Metrics

Collected Metrics

The monitoring service collects comprehensive system information:

type SystemMetrics struct {
  CPU              string  // CPU usage percentage
  CPUModel         string  // Processor model
  CPUCores         int32   // Logical cores
  CPUPhysicalCores int32   // Physical cores
  CPUSpeed         float64 // MHz
  OS               string  // Operating system
  Distro           string  // Linux distribution
  Kernel           string  // Kernel version
  Arch             string  // Architecture (amd64, arm64)
  MemUsed          string  // Memory usage %
  MemUsedGB        string  // Memory used in GB
  MemTotal         string  // Total memory in GB
  Uptime           uint64  // Seconds since boot
  DiskUsed         string  // Disk usage %
  TotalDisk        string  // Total disk space GB
  NetworkIn        string  // Network received MB
  NetworkOut       string  // Network sent MB
  Timestamp        string  // ISO 8601 timestamp
}

Example Response

{
  "timestamp": "2025-01-19T21:44:54.232164Z",
  "cpu": "24.57",
  "cpuModel": "Apple M1 Pro",
  "cpuCores": 8,
  "cpuPhysicalCores": 1,
  "cpuSpeed": 3228.0,
  "os": "darwin",
  "distro": "darwin",
  "kernel": "23.4.0",
  "arch": "arm64",
  "memUsed": "81.91",
  "memUsedGB": "13.11",
  "memTotal": "16.0",
  "uptime": 752232,
  "diskUsed": "89.34",
  "totalDisk": "460.43",
  "networkIn": "54.78",
  "networkOut": "31.72"
}

Querying Server Metrics

Retrieve metrics via the API:

const metrics = await fetch(
  `http://server-ip:3001/metrics?limit=50`,
  {
    headers: {
      Authorization: `Bearer ${token}`
    }
  }
).then(r => r.json());

Container Metrics

Supported Container Types

The monitoring service tracks all Docker container types:

Standalone Containers: Individual Docker containers
Docker Compose: Multi-container applications
Docker Swarm Stacks: Clustered service deployments

When monitoring Docker Compose or Swarm stacks, use the -p flag to properly identify all services within the stack.

Collected Metrics

{
  "timestamp": "2025-01-19T22:16:30.796129Z",
  "CPU": 83.76,
  "Memory": {
    "percentage": 0.03,
    "used": 2.262,
    "total": 7.654,
    "usedUnit": "MB",
    "totalUnit": "GB"
  },
  "Network": {
    "input": 306,
    "output": 0,
    "inputUnit": "B",
    "outputUnit": "B"
  },
  "BlockIO": {
    "read": 28.7,
    "write": 0,
    "readUnit": "kB",
    "writeUnit": "B"
  },
  "Container": "7428f5a49039",
  "ID": "7428f5a49039",
  "Name": "my-app-container"
}

Querying Container Metrics

const metrics = await fetch(
  `http://server-ip:3001/metrics/containers?appName=my-app&limit=50`,
  {
    headers: {
      Authorization: `Bearer ${token}`
    }
  }
).then(r => r.json());

The appName parameter is required. Without it, an empty array is returned.

Service Filtering

Include Services

Specify which containers to monitor:

{
  "containers": {
    "services": {
      "include": [
        "production-api",
        "production-web",
        "production-db"
      ]
    }
  }
}

Exclude Services

Exclude specific containers:

{
  "containers": {
    "services": {
      "include": ["*"],
      "exclude": [
        "test-container",
        "temp-debug"
      ]
    }
  }
}

Alerting

Threshold Configuration

Set alert thresholds for CPU and memory:

thresholds: {
  cpu: 80,     // Alert when CPU exceeds 80%
  memory: 85   // Alert when memory exceeds 85%
}

Alert Payload

When thresholds are exceeded, alerts are sent to the callback URL:

interface AlertPayload {
  ServerType: "Remote" | "Dokploy";
  Type: "CPU" | "Memory";
  Value: number;        // Current value
  Threshold: number;    // Configured threshold
  Message: string;      // Human-readable message
  Timestamp: string;    // ISO 8601 timestamp
  Token: string;        // Authentication token
}

Example Alert

{
  "json": {
    "ServerType": "Remote",
    "Type": "CPU",
    "Value": 85.42,
    "Threshold": 80.0,
    "Message": "CPU usage (85.42%) exceeded threshold (80.00%)",
    "Timestamp": "2025-01-19T22:30:15.123456Z",
    "Token": "secure-random-token"
  }
}

Webhook Delivery

Alerts are sent via HTTP POST to the configured callback URL:

resp, err := http.Post(
  callbackURL,
  "application/json",
  bytes.NewBuffer(jsonData)
)

Data Retention

Automatic Cleanup

Metrics are automatically deleted based on retention policy:

{
  "retentionDays": 7,
  "cronJob": "0 0 * * *"  // Run daily at midnight
}

The cleanup process:

Runs on configured cron schedule
Deletes metrics older than retentionDays
Maintains database size
Logs cleanup operations

Manual Cleanup

Metrics are stored in SQLite:

# Access database
sqlite3 /etc/dokploy/monitoring/monitoring.db

# View table schema
.schema server_metrics

# Count metrics
SELECT COUNT(*) FROM server_metrics;

# Delete old metrics manually
DELETE FROM server_metrics 
WHERE timestamp < datetime('now', '-7 days');

API Endpoints

Health Check

Verify monitoring service is running:

curl http://server-ip:3001/health

Response:

{
  "status": "ok"
}

The health endpoint does not require authentication.

Get Server Metrics

curl -H "Authorization: Bearer $TOKEN" \
  "http://server-ip:3001/metrics?limit=50"

Parameters:

limit: Number of metrics or “all” (default: 50)

Get Container Metrics

curl -H "Authorization: Bearer $TOKEN" \
  "http://server-ip:3001/metrics/containers?appName=my-app&limit=50"

Parameters:

appName: Container/service name (required)
limit: Number of metrics or “all” (default: 50)

Performance Considerations

Resource Usage

The monitoring service is designed to be lightweight:

Memory: ~50-100MB typical usage
CPU: Minimal impact (metrics collection runs periodically)
Disk: SQLite database grows with retention period
Network: Negligible (local collection only)

Refresh Rates

Choose appropriate collection intervals:

Interval	Use Case	Impact
15-30s	High-frequency monitoring	Higher resource usage
60s	Standard monitoring	Balanced
300s	Low-frequency checks	Minimal overhead

Very low refresh rates (< 10s) can impact system performance.

Best Practices

Metric Collection

Use 30-60 second refresh rates for most use cases
Monitor only critical containers to reduce overhead
Set appropriate retention periods (7-30 days)
Configure cleanup to run during off-peak hours

Alerting

Set realistic thresholds (80-85% for CPU, 85-90% for memory)
Test webhook endpoints before enabling
Monitor alert frequency to avoid spam
Document alert response procedures

Data Management

Backup monitoring database regularly
Monitor database size growth
Adjust retention based on storage capacity
Export historical data before cleanup

Security

Use strong random tokens
Rotate tokens periodically
Secure webhook endpoints with authentication
Restrict monitoring API access by IP

Troubleshooting

No Metrics Collected

Problem: Monitoring service not collecting data Solutions:

Check container is running: docker ps | grep monitoring
Review container logs: docker logs dokploy-monitoring
Verify configuration is valid JSON
Ensure Docker socket is mounted
Check file permissions on monitoring.db

Container Metrics Missing

Problem: Specific containers not appearing in metrics Solutions:

Verify container is in include list
Check container is not in exclude list
Ensure container is running
Validate container name matches exactly
Check monitoring service logs

Alerts Not Sending

Problem: Threshold exceeded but no alerts received Solutions:

Verify threshold is set (not 0)
Check callback URL is accessible
Test webhook endpoint manually
Review monitoring service logs for errors
Ensure token is correct
Verify server can reach callback URL

High Database Size

Problem: monitoring.db consuming too much space Solutions:

Reduce retention days
Increase cleanup frequency
Manually vacuum database: sqlite3 monitoring.db "VACUUM;"
Delete old metrics
Consider exporting and archiving data

Authentication Errors

Problem: API requests return 401/403 Solutions:

Verify token matches configuration
Check Authorization header format: Bearer <token>
Ensure token hasn’t been rotated
Review middleware logs
Test with health endpoint first (no auth required)

Getting Started

Deployment

Databases

Infrastructure

Advanced

Documentation Index

​Overview

Server Metrics

Container Metrics

Alerting

​Architecture

​Installation

​Configuration

​Metrics Configuration

​Configuration Options

​Server Metrics

​Collected Metrics

​Example Response

​Querying Server Metrics

​Container Metrics

​Supported Container Types

​Collected Metrics

​Querying Container Metrics

​Service Filtering

​Include Services

​Exclude Services

​Alerting

​Threshold Configuration

​Alert Payload

​Example Alert

​Webhook Delivery

​Data Retention

​Automatic Cleanup

​Manual Cleanup

​API Endpoints

​Health Check

​Get Server Metrics

​Get Container Metrics

​Performance Considerations

​Resource Usage

​Refresh Rates

​Best Practices

​Troubleshooting

​No Metrics Collected

​Container Metrics Missing

​Alerts Not Sending

​High Database Size

​Authentication Errors

Overview

Architecture

Installation

Configuration

Metrics Configuration

Configuration Options

Server Metrics

Collected Metrics

Example Response

Querying Server Metrics

Container Metrics

Supported Container Types

Collected Metrics

Querying Container Metrics

Service Filtering

Include Services

Exclude Services

Alerting

Threshold Configuration

Alert Payload

Example Alert

Webhook Delivery

Data Retention

Automatic Cleanup

Manual Cleanup

API Endpoints

Health Check

Get Server Metrics

Get Container Metrics

Performance Considerations

Resource Usage

Refresh Rates

Best Practices

Troubleshooting

No Metrics Collected

Container Metrics Missing

Alerts Not Sending

High Database Size

Authentication Errors