Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dokploy/dokploy/llms.txt

Use this file to discover all available pages before exploring further.

Dokploy includes a built-in monitoring service written in Go that collects system and container metrics in real-time.

Overview

The monitoring service provides:

Server Metrics

CPU, memory, disk, and network statistics for your servers

Container Metrics

Resource usage for individual Docker containers and services

Alerting

Threshold-based notifications via webhook callbacks

Architecture

The monitoring stack consists of:
  • Go Service: Lightweight metrics collector (dokploy/monitoring)
  • SQLite Database: Local storage for metrics data
  • HTTP API: RESTful endpoints for querying metrics
  • Webhook Integration: Alert delivery to Dokploy control plane

Installation

Monitoring is automatically configured during server setup:
await setupMonitoring(serverId);
This deploys a monitoring container:
const settings: ContainerCreateOptions = {
  name: "dokploy-monitoring",
  Image: "dokploy/monitoring:latest",
  Env: [`METRICS_CONFIG=${JSON.stringify(metricsConfig)}`],
  HostConfig: {
    RestartPolicy: { Name: "always" },
    NetworkMode: "host",
    Binds: [
      "/var/run/docker.sock:/var/run/docker.sock:ro",
      "/sys:/host/sys:ro",
      "/proc:/host/proc:ro",
      "/etc/os-release:/etc/os-release:ro",
      "/etc/dokploy/monitoring/monitoring.db:/app/monitoring.db"
    ],
    PortBindings: {
      "3001/tcp": [{ HostPort: "3001" }]
    }
  }
};
The monitoring service uses host networking mode to accurately collect system metrics.

Configuration

Metrics Configuration

Configure monitoring via the server’s metricsConfig object:
{
  "server": {
    "type": "Remote",
    "refreshRate": 30,
    "retentionDays": 7,
    "port": 3001,
    "token": "secure-random-token",
    "urlCallback": "https://dokploy.example.com/api/trpc/notification.receiveNotification",
    "cronJob": "0 0 * * *",
    "thresholds": {
      "cpu": 80,
      "memory": 85
    }
  },
  "containers": {
    "refreshRate": 60,
    "services": {
      "include": ["my-app-service"],
      "exclude": ["temp-container"]
    }
  }
}

Configuration Options

OptionTypeDescription
typestringServer type: “Remote” or “Dokploy”
refreshRatenumberCollection interval in seconds
retentionDaysnumberDays to retain metrics
portnumberHTTP API port (default: 3001)
tokenstringAuthentication token
urlCallbackstringWebhook URL for alerts
cronJobstringCleanup schedule (cron format)
thresholds.cpunumberCPU alert threshold (0-100%)
thresholds.memorynumberMemory alert threshold (0-100%)
Setting a threshold to 0 disables alerting for that metric.

Server Metrics

Collected Metrics

The monitoring service collects comprehensive system information:
type SystemMetrics struct {
  CPU              string  // CPU usage percentage
  CPUModel         string  // Processor model
  CPUCores         int32   // Logical cores
  CPUPhysicalCores int32   // Physical cores
  CPUSpeed         float64 // MHz
  OS               string  // Operating system
  Distro           string  // Linux distribution
  Kernel           string  // Kernel version
  Arch             string  // Architecture (amd64, arm64)
  MemUsed          string  // Memory usage %
  MemUsedGB        string  // Memory used in GB
  MemTotal         string  // Total memory in GB
  Uptime           uint64  // Seconds since boot
  DiskUsed         string  // Disk usage %
  TotalDisk        string  // Total disk space GB
  NetworkIn        string  // Network received MB
  NetworkOut       string  // Network sent MB
  Timestamp        string  // ISO 8601 timestamp
}

Example Response

{
  "timestamp": "2025-01-19T21:44:54.232164Z",
  "cpu": "24.57",
  "cpuModel": "Apple M1 Pro",
  "cpuCores": 8,
  "cpuPhysicalCores": 1,
  "cpuSpeed": 3228.0,
  "os": "darwin",
  "distro": "darwin",
  "kernel": "23.4.0",
  "arch": "arm64",
  "memUsed": "81.91",
  "memUsedGB": "13.11",
  "memTotal": "16.0",
  "uptime": 752232,
  "diskUsed": "89.34",
  "totalDisk": "460.43",
  "networkIn": "54.78",
  "networkOut": "31.72"
}

Querying Server Metrics

Retrieve metrics via the API:
const metrics = await fetch(
  `http://server-ip:3001/metrics?limit=50`,
  {
    headers: {
      Authorization: `Bearer ${token}`
    }
  }
).then(r => r.json());

Container Metrics

Supported Container Types

The monitoring service tracks all Docker container types:
  • Standalone Containers: Individual Docker containers
  • Docker Compose: Multi-container applications
  • Docker Swarm Stacks: Clustered service deployments
When monitoring Docker Compose or Swarm stacks, use the -p flag to properly identify all services within the stack.

Collected Metrics

{
  "timestamp": "2025-01-19T22:16:30.796129Z",
  "CPU": 83.76,
  "Memory": {
    "percentage": 0.03,
    "used": 2.262,
    "total": 7.654,
    "usedUnit": "MB",
    "totalUnit": "GB"
  },
  "Network": {
    "input": 306,
    "output": 0,
    "inputUnit": "B",
    "outputUnit": "B"
  },
  "BlockIO": {
    "read": 28.7,
    "write": 0,
    "readUnit": "kB",
    "writeUnit": "B"
  },
  "Container": "7428f5a49039",
  "ID": "7428f5a49039",
  "Name": "my-app-container"
}

Querying Container Metrics

const metrics = await fetch(
  `http://server-ip:3001/metrics/containers?appName=my-app&limit=50`,
  {
    headers: {
      Authorization: `Bearer ${token}`
    }
  }
).then(r => r.json());
The appName parameter is required. Without it, an empty array is returned.

Service Filtering

Include Services

Specify which containers to monitor:
{
  "containers": {
    "services": {
      "include": [
        "production-api",
        "production-web",
        "production-db"
      ]
    }
  }
}

Exclude Services

Exclude specific containers:
{
  "containers": {
    "services": {
      "include": ["*"],
      "exclude": [
        "test-container",
        "temp-debug"
      ]
    }
  }
}

Alerting

Threshold Configuration

Set alert thresholds for CPU and memory:
thresholds: {
  cpu: 80,     // Alert when CPU exceeds 80%
  memory: 85   // Alert when memory exceeds 85%
}

Alert Payload

When thresholds are exceeded, alerts are sent to the callback URL:
interface AlertPayload {
  ServerType: "Remote" | "Dokploy";
  Type: "CPU" | "Memory";
  Value: number;        // Current value
  Threshold: number;    // Configured threshold
  Message: string;      // Human-readable message
  Timestamp: string;    // ISO 8601 timestamp
  Token: string;        // Authentication token
}

Example Alert

{
  "json": {
    "ServerType": "Remote",
    "Type": "CPU",
    "Value": 85.42,
    "Threshold": 80.0,
    "Message": "CPU usage (85.42%) exceeded threshold (80.00%)",
    "Timestamp": "2025-01-19T22:30:15.123456Z",
    "Token": "secure-random-token"
  }
}

Webhook Delivery

Alerts are sent via HTTP POST to the configured callback URL:
resp, err := http.Post(
  callbackURL,
  "application/json",
  bytes.NewBuffer(jsonData)
)

Data Retention

Automatic Cleanup

Metrics are automatically deleted based on retention policy:
{
  "retentionDays": 7,
  "cronJob": "0 0 * * *"  // Run daily at midnight
}
The cleanup process:
  1. Runs on configured cron schedule
  2. Deletes metrics older than retentionDays
  3. Maintains database size
  4. Logs cleanup operations

Manual Cleanup

Metrics are stored in SQLite:
# Access database
sqlite3 /etc/dokploy/monitoring/monitoring.db

# View table schema
.schema server_metrics

# Count metrics
SELECT COUNT(*) FROM server_metrics;

# Delete old metrics manually
DELETE FROM server_metrics 
WHERE timestamp < datetime('now', '-7 days');

API Endpoints

Health Check

Verify monitoring service is running:
curl http://server-ip:3001/health
Response:
{
  "status": "ok"
}
The health endpoint does not require authentication.

Get Server Metrics

curl -H "Authorization: Bearer $TOKEN" \
  "http://server-ip:3001/metrics?limit=50"
Parameters:
  • limit: Number of metrics or “all” (default: 50)

Get Container Metrics

curl -H "Authorization: Bearer $TOKEN" \
  "http://server-ip:3001/metrics/containers?appName=my-app&limit=50"
Parameters:
  • appName: Container/service name (required)
  • limit: Number of metrics or “all” (default: 50)

Performance Considerations

Resource Usage

The monitoring service is designed to be lightweight:
  • Memory: ~50-100MB typical usage
  • CPU: Minimal impact (metrics collection runs periodically)
  • Disk: SQLite database grows with retention period
  • Network: Negligible (local collection only)

Refresh Rates

Choose appropriate collection intervals:
IntervalUse CaseImpact
15-30sHigh-frequency monitoringHigher resource usage
60sStandard monitoringBalanced
300sLow-frequency checksMinimal overhead
Very low refresh rates (< 10s) can impact system performance.

Best Practices

  • Use 30-60 second refresh rates for most use cases
  • Monitor only critical containers to reduce overhead
  • Set appropriate retention periods (7-30 days)
  • Configure cleanup to run during off-peak hours
  • Set realistic thresholds (80-85% for CPU, 85-90% for memory)
  • Test webhook endpoints before enabling
  • Monitor alert frequency to avoid spam
  • Document alert response procedures
  • Backup monitoring database regularly
  • Monitor database size growth
  • Adjust retention based on storage capacity
  • Export historical data before cleanup
  • Use strong random tokens
  • Rotate tokens periodically
  • Secure webhook endpoints with authentication
  • Restrict monitoring API access by IP

Troubleshooting

No Metrics Collected

Problem: Monitoring service not collecting data Solutions:
  • Check container is running: docker ps | grep monitoring
  • Review container logs: docker logs dokploy-monitoring
  • Verify configuration is valid JSON
  • Ensure Docker socket is mounted
  • Check file permissions on monitoring.db

Container Metrics Missing

Problem: Specific containers not appearing in metrics Solutions:
  • Verify container is in include list
  • Check container is not in exclude list
  • Ensure container is running
  • Validate container name matches exactly
  • Check monitoring service logs

Alerts Not Sending

Problem: Threshold exceeded but no alerts received Solutions:
  • Verify threshold is set (not 0)
  • Check callback URL is accessible
  • Test webhook endpoint manually
  • Review monitoring service logs for errors
  • Ensure token is correct
  • Verify server can reach callback URL

High Database Size

Problem: monitoring.db consuming too much space Solutions:
  • Reduce retention days
  • Increase cleanup frequency
  • Manually vacuum database: sqlite3 monitoring.db "VACUUM;"
  • Delete old metrics
  • Consider exporting and archiving data

Authentication Errors

Problem: API requests return 401/403 Solutions:
  • Verify token matches configuration
  • Check Authorization header format: Bearer <token>
  • Ensure token hasn’t been rotated
  • Review middleware logs
  • Test with health endpoint first (no auth required)