Skip to Content
DocsCloud OSPredictive Insights

Predictive Insights

Quazzar Cloud OS continuously monitors your node’s health and uses classical statistics to surface forward-looking signals — anomaly alerts, capacity forecasts, backup health scores, and app stability ratings — without requiring any ML runtime or cloud dependency.

Everything runs inside the Go process on your node. Data never leaves.


What you get

Four predictors run on a 1-hour scoring cycle and write calibrated results to a local predictions SQLite table. Results are available via:

  • Molly — ask “What does my server predict?” and Molly queries the table.
  • MCP tool — the get_predictions tool surfaces predictions to any MCP client (Claude Desktop, Claude Code, Cursor, …).
  • REST endpointGET /api/insights/predictions returns the latest scored row per predictor in JSON.
  • Dashboard card — the Predictive Insights card shows the four predictors with risk levels and summaries at a glance.

The four predictors

resource_anomaly

Detects when CPU, RAM, disk I/O, or network metrics fall outside their historical normal range for the same hour-of-day and day-of-week.

The detector uses a per-(hour, weekday) EMA baseline and a z-score threshold (|z| > 3 = anomaly). A metric is flagged when its current reading is far outside its normal range for this time of day and day of week, using your node’s own system_metrics table — no external service involved.

The detector needs at least two weeks of data for a given (hour, weekday) slot before it will emit an anomaly score. Before that threshold is reached, the predictor returns insufficient_data rather than a fabricated number.

Output fields: metric, z_score, baseline_mean, baseline_stddev, risk_level (low / medium / high), detected_at.


capacity_forecast

Projects how many days remain before disk (or RAM/CPU, when trend data is sufficient) reaches saturation, using Ordinary Least Squares regression over the 30-day daily rollup.

  • Disk capacity — always available. Generalises the live OLS model in monitoring/disk_health.go to a longer horizon using the metrics_rollup_daily table.
  • RAM / CPU saturation trend — enabled when 30+ days of daily rollup data are present; marked as a follow-up capability for v1 and may surface as insufficient_data on fresh nodes.

Honest-refusal: a monotonically-decreasing or flat trend with R² < 0.3 returns insufficient_data rather than a speculative forecast.

Output fields: resource, days_until_full, current_pct, trend_slope, r_squared, risk_level, forecast_date.


backup_health

Scores your backup reliability using three signals:

  • Failure rate — percentage of backup runs that failed in the last 30 days.
  • Consecutive failures — how many of the most recent runs in a row have failed.
  • Duration/size trend — whether backup duration or size is growing abnormally (early signal of bloat or infrastructure problems).

Requires at least 5 backup runs in the window to emit a score.

Output fields: failure_rate_pct, consecutive_failures, last_success_at, risk_level, summary.


app_health

Scores each running app’s stability using:

  • Restart frequency — restarts per hour over the last 24h and 7d.
  • Lifecycle events — crash, OOM-kill, and abnormal-exit events captured from Docker event streams and app_events.
  • OOM / crash flag — any OOM-kill or crash in the last 24h bumps risk to high regardless of restart count.

Requires at least 24 hours of app telemetry before a stable score is emitted.

Output fields: app_name, restarts_24h, restarts_7d, oom_kills_24h, crashes_24h, risk_level, summary.


Honest-refusal

Every predictor is guarded against fabricating numbers when data is insufficient. When a predictor cannot produce a reliable result it returns:

{ "predictor": "capacity_forecast", "status": "insufficient_data", "reason": "fewer than 14 daily rollup rows available" }

This means you will never see an invented “days until full” on a fresh node. Predictions only appear once the statistical confidence is real.


The get_predictions MCP tool

Any MCP client connected to your node can call:

get_predictions

Optional parameters:

ParameterTypeDescription
predictorstringFilter to one predictor: resource_anomaly, capacity_forecast, backup_health, app_health. Omit for all.
min_riskstringOnly return predictions at or above this risk level: low, medium, high.

Example (Claude Desktop):

“Show me any high-risk predictions for my server.”

Molly and Claude will call get_predictions automatically when the query is about server health, disk space, or app stability.


REST endpoint

GET /api/insights/predictions

Query parameters mirror the MCP tool:

ParameterExampleDescription
predictorcapacity_forecastFilter to a single predictor.
min_riskmediumMinimum risk level to return.

Response:

{ "scored_at": "2026-05-22T14:00:00Z", "predictions": [ { "predictor": "capacity_forecast", "status": "ok", "risk_level": "high", "days_until_full": 12, "resource": "disk", "current_pct": 87.4 } ] }

Proactive alerts (gated)

Molly can send a push notification when a high-risk prediction is scored, before you ask. This feature is off by default and must be opted in with an environment variable:

QUAZZAR_INSIGHTS_ALERTS=true

See the admin guide on Predictive Insights alerts for dedup window, notification cap, and how to enable alerts per-predictor.

When enabled, Molly sends at most one alert per predictor per hour (dedup-keyed by predictor + risk_level). A high-risk disk forecast alert at 2 pm will not re-fire at 3 pm if the risk level is unchanged.


Dashboard card

Open Dashboard. The Predictive Insights card appears below the system metrics section and shows:

  • Current risk level (colour-coded: green / amber / red) for each predictor.
  • A one-line summary from the most recent scored row.
  • A “See details” link that opens the full prediction in the Molly chat sidebar.

The card refreshes every 5 minutes (on the same poll cycle as the monitoring widgets) and shows a “Collecting data…” placeholder while a predictor is still in insufficient_data state.