Clustering
Cloud OS supports native multi-node clustering with Raft consensus, peer-to-peer communication, and cluster-wide application deployment. You can spread applications across multiple servers and manage them from a single dashboard.
How Clustering Works
One server acts as the leader node and coordinates cluster operations via the Raft consensus protocol. Additional servers join as member nodes. All nodes communicate peer-to-peer, and leader election happens automatically if the current leader becomes unavailable.
+-------------+ peer-to-peer +-------------+
| Leader |<----------------->| Member 1 |
| Node | | |
+------+-------+ +------+-------+
| peer-to-peer |
|<-------------------------------->|
| |
v peer-to-peer v
+-------------+<----------------->+-------------+
| Member 2 | | Member 3 |
| | | |
+-------------+ +-------------+Raft Consensus
Cloud OS uses the Raft consensus algorithm for:
- Leader election — automatic selection of a new leader if the current leader fails
- State replication — cluster configuration and app placement data replicated across nodes
- Consistency — all nodes agree on the cluster state before changes are applied
This ensures high availability — if the leader goes down, a new leader is elected and the cluster continues operating.
Setting Up a Cluster
Prerequisites
- Two or more servers with Cloud OS installed
- Network connectivity between servers
- Same Cloud OS version on all nodes
Creating a Cluster
On the server you want as the initial leader:
- Navigate to Clustering from the sidebar
- Click Create Cluster
- A join token is generated
Adding Member Nodes
From the leader node UI:
- Click Add Node
- Copy the join command shown in the dialog
- Run it on the member server:
quazzar join --token <TOKEN> --primary <LEADER_IP>Via auto-discovery (LAN only):
Nodes on the same local network can discover each other. The leader Clustering page shows discovered nodes that can be added with one click.
What Happens During Join
- The member node registers with the leader
- The Raft cluster is expanded to include the new node
- Cluster state is replicated to the new member
- Peer-to-peer communication channels are established
- The member appears on the Clustering page with its status
Cluster Dashboard
The Clustering page provides a visual overview:
- Topology view — nodes displayed as cards connected by lines
- Per-node status — CPU, RAM, disk usage, and app count for each node
- App distribution — which apps are running on which nodes
- Health indicators — online, warning, or offline status per node
- Leader indicator — which node is the current Raft leader
Node Detail
Click any node to see:
- Full resource metrics (CPU, RAM, disk, network)
- List of apps installed on this node
- Node role (leader or member)
- Heartbeat status and latency
Cluster-Wide App Deployment
When deploying an app in a cluster, you choose which node it runs on:
- Open the App Store and select an app
- During installation, select the target node
- Cloud OS deploys the app on the selected node
- The app is accessible from any node via the cluster routing
App Placement
By default, apps are installed on the node where you trigger the installation. You can move apps between nodes after deployment.
Automatic Failover
When a node becomes unreachable:
- The Raft protocol detects the failure via missed heartbeats
- If the failed node was the leader, a new leader is elected
- Apps on the failed node are marked as unavailable
- When the node recovers, its apps are automatically restarted
Automatic failover ensures cluster coordination continues. App containers on the failed node are unavailable until the node recovers. For full app-level HA, deploy replicas across multiple nodes.
HA Mode
High Availability mode enables shared state replication across nodes:
- Cluster configuration is replicated via Raft log
- App metadata and routing tables are consistent across all nodes
- Any node can serve API requests with current cluster state
Peer-to-Peer Communication
Nodes communicate directly with each other for:
- Cluster state synchronization
- App deployment commands
- Metrics aggregation
- Health checks and heartbeats
This reduces single points of failure compared to hub-and-spoke architectures.
Cross-Node App Migration
Move a running app from one node to another:
- Go to the app detail page
- Click Migrate
- Select the target node
- Cloud OS stops the app, transfers data volumes, and restarts on the target node
Migration involves transferring data between nodes. For apps with large data volumes, this may take significant time. The app is unavailable during migration.
Service Discovery
Apps can find each other across nodes using internal DNS. When app A on node 1 needs to connect to PostgreSQL on node 2, it uses the service name:
postgresql.quazzar.internalCloud OS maintains a DNS resolver that maps service names to the correct node address.
Troubleshooting
Node shows “offline” status
Verify network connectivity between nodes. Check that the Cloud OS service is running on the offline node:
systemctl status quazzarLeader election is stuck
If no leader can be elected, check that a majority of nodes (quorum) are reachable. Raft requires a majority to elect a leader. For a 3-node cluster, at least 2 nodes must be online.
App migration fails
Check available disk space on the target node. The target needs enough free space to receive the app data volumes. Also verify that peer-to-peer communication is stable between the two nodes.
Cross-node routing not working
Verify the leader node has correct routing information for all member nodes. Check the Clustering page to confirm app locations are correctly registered. Restart the cluster routing if routes appear stale.