AI Orchestration
The AI orchestration system coordinates AI workloads across multiple Cloud OS instances in your fleet, automatically routing requests to the best available instance based on capacity and health.
AI orchestration requires the ai_orchestration license feature (Enterprise plan).
How It Works
When you submit an orchestrated AI task, the system:
- Evaluates capacity — checks GPU and CPU utilization across all instances with AI capabilities
- Selects instances — routes the task to the least-loaded instance (or distributes across multiple instances for team tasks)
- Handles failover — if the selected instance becomes unavailable, the task is re-routed to a healthy instance
- Aggregates results — for multi-instance tasks, collects and merges outputs from all participating instances
Submitting an Orchestrated Task
- Navigate to AI > Orchestration from the sidebar
- Click New Task
- Describe the task or select a pre-configured workflow
- The system automatically selects the best instance(s)
- Monitor progress on the task detail page
Cluster AI Capacity
The capacity overview shows real-time AI resource availability across your fleet:
- GPU utilization per instance
- CPU availability per instance
- Number of active AI tasks
- Available model capacity
Orchestration API
| Endpoint | Method | Description |
|---|---|---|
/api/ai/orchestrate | POST | Submit an orchestrated AI task |
/api/ai/orchestrate/{id} | GET | Get task status with per-instance results |
/api/ai/capacity | GET | Cluster AI capacity overview |