Skip to Content
DocsCloud OSClustering

Clustering

Cloud OS supports native multi-node clustering with Raft consensus, peer-to-peer communication, and cluster-wide application deployment. You can spread applications across multiple servers and manage them from a single dashboard.

How Clustering Works

One server acts as the leader node and coordinates cluster operations via the Raft consensus protocol. Additional servers join as member nodes. All nodes communicate peer-to-peer, and leader election happens automatically if the current leader becomes unavailable.

+-------------+ peer-to-peer +-------------+ | Leader |<----------------->| Member 1 | | Node | | | +------+-------+ +------+-------+ | peer-to-peer | |<-------------------------------->| | | v peer-to-peer v +-------------+<----------------->+-------------+ | Member 2 | | Member 3 | | | | | +-------------+ +-------------+

Raft Consensus

Cloud OS uses the Raft consensus algorithm for:

  • Leader election — automatic selection of a new leader if the current leader fails
  • State replication — cluster configuration and app placement data replicated across nodes
  • Consistency — all nodes agree on the cluster state before changes are applied

This ensures high availability — if the leader goes down, a new leader is elected and the cluster continues operating.

Setting Up a Cluster

Prerequisites

  • Two or more servers with Cloud OS installed
  • Network connectivity between servers
  • Same Cloud OS version on all nodes

Creating a Cluster

On the server you want as the initial leader:

  1. Navigate to Clustering from the sidebar
  2. Click Create Cluster
  3. A join token is generated

Adding Member Nodes

From the leader node UI:

  1. Click Add Node
  2. Copy the join command shown in the dialog
  3. Run it on the member server:
quazzar join --token <TOKEN> --primary <LEADER_IP>

Via auto-discovery (LAN only):

Nodes on the same local network can discover each other. The leader Clustering page shows discovered nodes that can be added with one click.

What Happens During Join

  1. The member node registers with the leader
  2. The Raft cluster is expanded to include the new node
  3. Cluster state is replicated to the new member
  4. Peer-to-peer communication channels are established
  5. The member appears on the Clustering page with its status

Cluster Dashboard

The Clustering page provides a visual overview:

  • Topology view — nodes displayed as cards connected by lines
  • Per-node status — CPU, RAM, disk usage, and app count for each node
  • App distribution — which apps are running on which nodes
  • Health indicators — online, warning, or offline status per node
  • Leader indicator — which node is the current Raft leader

Node Detail

Click any node to see:

  • Full resource metrics (CPU, RAM, disk, network)
  • List of apps installed on this node
  • Node role (leader or member)
  • Heartbeat status and latency

Cluster-Wide App Deployment

When deploying an app in a cluster, you choose which node it runs on:

  1. Open the App Store and select an app
  2. During installation, select the target node
  3. Cloud OS deploys the app on the selected node
  4. The app is accessible from any node via the cluster routing

App Placement

By default, apps are installed on the node where you trigger the installation. You can move apps between nodes after deployment.

Automatic Failover

When a node becomes unreachable:

  1. The Raft protocol detects the failure via missed heartbeats
  2. If the failed node was the leader, a new leader is elected
  3. Apps on the failed node are marked as unavailable
  4. When the node recovers, its apps are automatically restarted

Automatic failover ensures cluster coordination continues. App containers on the failed node are unavailable until the node recovers. For full app-level HA, deploy replicas across multiple nodes.

HA Mode

High Availability mode enables shared state replication across nodes:

  • Cluster configuration is replicated via Raft log
  • App metadata and routing tables are consistent across all nodes
  • Any node can serve API requests with current cluster state

Peer-to-Peer Communication

Nodes communicate directly with each other for:

  • Cluster state synchronization
  • App deployment commands
  • Metrics aggregation
  • Health checks and heartbeats

This reduces single points of failure compared to hub-and-spoke architectures.

Cross-Node App Migration

Move a running app from one node to another:

  1. Go to the app detail page
  2. Click Migrate
  3. Select the target node
  4. Cloud OS stops the app, transfers data volumes, and restarts on the target node

Migration involves transferring data between nodes. For apps with large data volumes, this may take significant time. The app is unavailable during migration.

Service Discovery

Apps can find each other across nodes using internal DNS. When app A on node 1 needs to connect to PostgreSQL on node 2, it uses the service name:

postgresql.quazzar.internal

Cloud OS maintains a DNS resolver that maps service names to the correct node address.

Troubleshooting

Node shows “offline” status

Verify network connectivity between nodes. Check that the Cloud OS service is running on the offline node:

systemctl status quazzar

Leader election is stuck

If no leader can be elected, check that a majority of nodes (quorum) are reachable. Raft requires a majority to elect a leader. For a 3-node cluster, at least 2 nodes must be online.

App migration fails

Check available disk space on the target node. The target needs enough free space to receive the app data volumes. Also verify that peer-to-peer communication is stable between the two nodes.

Cross-node routing not working

Verify the leader node has correct routing information for all member nodes. Check the Clustering page to confirm app locations are correctly registered. Restart the cluster routing if routes appear stale.