Skip to Content
DocsAdmin GuideMonitoring Setup

Monitoring Setup

Cloud OS includes built-in monitoring for system and application metrics. For fleet deployments, it also supports exporting metrics to external tools.


Built-in Monitoring

The Cloud OS dashboard provides real-time views of:

  • System metrics — CPU, memory, disk, and network usage
  • App metrics — per-container resource consumption
  • Disk health — S.M.A.R.T. status and partition usage
  • Audit logs — administrative actions with timestamps
  • Alert history — triggered and resolved alerts

For single-instance deployments, the built-in monitoring is often sufficient.


Alert Channels

Configure notification channels to receive alerts when metrics cross thresholds. Supported channel types:

TypeDescription
EmailSMTP-based notifications
SlackChannel notifications via webhook
TelegramBot notifications
WebhookCustom HTTP endpoint
PagerDutyIncident management

Configure alert channels from the Settings > Alerts section of the dashboard.

Configure at least two alert channels for redundancy (e.g., Slack and email). Test channels after setup to verify delivery.


Common Alert Rules

ConditionSeverity
CPU usage above 90% for 5 minutesWarning
Memory usage above 90% for 5 minutesWarning
Disk usage above 85%Warning
Disk usage above 95%Critical
App container crashedCritical
Backup overdue (over 12 hours)Warning
Security score below 70Warning

Create and manage alert rules from the dashboard.


Disk Health

Cloud OS monitors disk health using S.M.A.R.T. data to provide early warning of hardware failures. Automatic alerts are generated for bad sectors, overheating, and SSD wear levels.


External Integrations

For fleet deployments or advanced monitoring, Cloud OS can expose a Prometheus-compatible metrics endpoint. Add the Cloud OS instance as a Prometheus scrape target and use Grafana for dashboards.

ComponentTool
MetricsPrometheus + Grafana
LogsLoki + Grafana
AlertsGrafana Alerting or PagerDuty
UptimeUptimeRobot or similar

Metrics Retention

Built-in metrics are stored with decreasing resolution over time:

ResolutionRetention
Full resolution24 hours
1-minute averages7 days
5-minute averages30 days
1-hour averages1 year

Retention periods can be adjusted in the configuration file. For longer retention, export metrics to an external time-series database.

Tips

  • Use structured JSON logging for integration with log aggregation tools.
  • Test alert channels regularly to ensure they work.
  • Monitor backup success alongside system metrics.
  • Export to Prometheus for fleet-wide dashboards.