Control Panel CA rotation
Every OS node, at registration, receives a client certificate signed by the Control Panel’s CA plus a copy of the CA’s public cert, which it pins for verifying the CP server. After a CA rotation those pinned copies no longer match the new CA — every registered node fails the mTLS handshake with:
tls: failed to verify certificate: x509: certificate signed by unknown
authority (possibly because of "x509: ECDSA verification failure" while
trying to verify candidate authority certificate "Quazzar Control Center CA")The CP refuses to auto-regenerate the CA on every boot for exactly this
reason — the CA lives on a PersistentVolume at
/var/lib/quazzar-cc/certs/{ca.key,ca.crt} and is meant to last 10 years.
But rotations do happen (key compromise, vendor switch, accidental PVC
recreation with CC_ALLOW_AUTO_CA_GEN=true). This page covers the
zero-downtime procedure.
Symptoms — how to know you’ve drifted
- The CP log fills with
TLS handshake error … x509: ECDSA verification failure …. - Every OS node shows “needs re-registration” in its CC agent status,
with the local log line:
CC no longer trusts this node’s client certificate — the CA was likely rotated on the Control Panel.
kubectl execinto the CP pod and compare:with a fleet node’s:openssl x509 -in /var/lib/quazzar-cc/certs/ca.crt -noout -fingerprint -sha256Different fingerprints → you have CA drift.ssh <node> sudo openssl x509 -in /var/lib/quazzar/certs/ca.crt -noout -fingerprint -sha256
Recovery — the rotation grace pool
Don’t try to back-rotate the new CA out — that would invalidate any nodes that have already picked it up. Instead, keep the previous CA trusted on the CP for a grace window while nodes naturally re-register.
-
Preserve the old CA cert. Whatever you have from before the rotation — a backup of
/var/lib/quazzar-cc/certs/ca.crt, a copy exported from any still-running OS node at/var/lib/quazzar/certs/ca.crt, or a cert pulled out of a registered instance’s heartbeat headers — place it on the CP host as e.g./var/lib/quazzar-cc/certs/ca.crt.old. -
Add it to the trust pool. Set the env var on the CP deployment:
CC_TRUST_ADDITIONAL_CA_CERTS=/var/lib/quazzar-cc/certs/ca.crt.old(Comma-separate if you have multiple historical CAs.) Roll the deployment. On boot the CP logs:
loaded additional trusted CA (rotation grace)
The active CA is still the only one that signs new certs. The legacy CAs only extend the trust pool for verifying existing client certs.
-
Re-register affected nodes at your own pace. Each node, once re-registered, picks up the new CA and stops depending on the old one. Use the same flow you used to bootstrap the node originally (generate a registration token in the CP UI, POST it to the OS’s
/api/cc/register). -
Decommission the legacy CA. When every active fleet node has re-registered (and any nodes pinned to the old CA are gone or have their client certs naturally expired — client certs are 90 days), remove the entry from
CC_TRUST_ADDITIONAL_CA_CERTSand roll the deployment again. The pool returns to just the active CA.
Preventing the next drift
-
Always mount
/var/lib/quazzar-cc/certsas a PersistentVolume in production. The CP refuses to auto-generate when the dir is missing unlessCC_ALLOW_AUTO_CA_GEN=true— keep that env var unset in prod manifests so a PVC mishap can’t silently mint a fresh CA. -
Back up the CA pair out-of-band. The 10-year validity makes backups easy to forget, but losing the key forces a full fleet re-registration with no graceful path.
-
Monitor for the noise pattern. Genuine drift produces a steady stream of
TLS handshake error … ECDSA verification failurefrom public source IPs (not LB internal CIDRs — those probe the mTLS port without certs as TCP health checks and are demoted to DEBUG automatically). If you see this from a public IP, suspect drift.
Why not just rotate the active CA back?
Because any node that registered after the rotation has already pinned the new CA. Reverting the active CA cuts those nodes off instead. The grace-pool approach keeps both populations working until each node transitions on its own schedule.