High Availability and Disaster Recovery
How High Availability Works
Conncentric uses three layers of protection to keep your integrations running.
Layer 1: Automatic Session Failover
Connectivity sessions and consumers are assigned to an active adapter pod. If that pod fails, another pod takes over the session automatically.
Here's what happens:
- The active pod sends a heartbeat to the Orchestrator every few seconds.
- If the heartbeats stop, the Orchestrator waits out a configurable timeout.
- After the timeout, another pod claims the session and starts it up.
- The previous pod, if it recovers, detects the conflict and shuts itself down immediately (fencing).
Time to recover: The lease timeout (configurable) plus the time to re-establish connectivity.
For production environments requiring Exclusive Consumption, run at least two adapter pods so there is always a standby available.
Layer 2: Orchestrator Redundancy
The Orchestrator can run with multiple instances. If one goes down, the others continue serving the Portal and the adapter pods without interruption.
Adapter pods keep their active sessions running during an Orchestrator restart. They do not require the Orchestrator to be available every second, only to maintain their heartbeats.
Layer 3: Database High Availability
All platform state lives in PostgreSQL. Use your database provider's high-availability option so the database itself can survive a server failure.
| Provider | Recommended Option |
|---|---|
| AWS RDS | Multi-AZ deployment |
| AWS Aurora | Aurora PostgreSQL with read replicas |
| Google Cloud SQL | High availability configuration |
| Azure Database for PostgreSQL | Zone-redundant HA |
Disaster Recovery
What Needs to Be Backed Up
| Data | Where It Lives | Backup Method |
|---|---|---|
| All adapter configurations | PostgreSQL | Database backups |
| Artifacts (data dictionaries, schemas) | PostgreSQL | Database backups |
| Installed plugins | PostgreSQL | Database backups |
| Helm values | Your version control system | Keep in Git |
Everything important lives in the database. If you back up PostgreSQL regularly, you can recover the full platform.
Recovery Behavior
| Scenario | How It Recovers |
|---|---|
| Adapter pod crashes | Kubernetes restarts it; another pod claims the session after lease timeout |
| Entire node fails | Kubernetes reschedules pods on another node; sessions are reclaimed |
| Orchestrator pod crashes | Kubernetes restarts it; adapter pods continue running during the restart |
| Database failover | Provider-managed automatic failover; platform reconnects when the database is available |
How to Recover From a Full Cluster Loss
- Provision a new Kubernetes cluster
- Restore your PostgreSQL backup
- Deploy Conncentric using your saved Helm values
- Adapter pods start up and claim sessions automatically
No manual reconfiguration is needed. All the configuration is in the database.
Planned Maintenance
To shut down cleanly before maintenance:
- Disable all adapters in the Portal: go to Adapters, select all, click Disable.
- Wait for all sessions to close. Watch the operational status move to
INACTIVE. - Perform the maintenance.
- Re-enable adapters when done.