Metrics & Monitoring
Conncentric exposes metrics compatible with Prometheus, Datadog, CloudWatch, and other monitoring backends.
Prometheus Scrape Endpoints
Each component exposes a Prometheus-compatible endpoint on its management port. No additional Helm configuration is required to enable scraping; the endpoints are always available.
| Component | Path | Port |
|---|---|---|
| Orchestrator | /actuator/prometheus | 8081 (management port) |
| Adapter | /actuator/prometheus | 8081 (management port) |
The management port (8081) is separate from the application port (8080) and is not exposed outside the cluster. Prometheus scrapes it over the cluster-internal pod network.
ServiceMonitor
If your cluster uses the Prometheus Operator, create a ServiceMonitor to discover Conncentric pods automatically:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: conncentric
namespace: conncentric
labels:
release: prometheus # Must match your Prometheus Operator's selector
spec:
namespaceSelector:
matchNames:
- conncentric
selector:
matchLabels:
app.kubernetes.io/part-of: conncentric
endpoints:
- port: management
path: /actuator/prometheus
interval: 15s
If you are not using the Prometheus Operator, configure your Prometheus scrape_configs to target port 8081 and path /actuator/prometheus for pods in the conncentric namespace.
Orchestrator Metrics
conncentric_enabled_adapters_count
Type: Gauge
The total number of adapters in the ENABLED administrative state. This is also used by the Kubernetes Horizontal Pod Autoscaler (HPA) to scale the adapter worker pool, since more enabled adapters means more pod replicas are needed.
Tags: none
Use: Alert if this unexpectedly drops to 0 in production.
adapter_lifecycle_state
Type: Gauge (state-set pattern, one gauge per adapter per state value)
Reports which lifecycle stage each adapter is currently in. The gauge for the active state reports 1; all others report 0.
Tags:
| Tag | Description |
|---|---|
adapter_id | The adapter's logical ID |
display_name | Human-readable name |
state | The state this gauge represents |
State values:
| Value | Meaning |
|---|---|
provisioning | Adapter has a lease and is downloading plugins/config |
active | Adapter is running the message pipeline |
standby | Adapter is healthy but waiting; a sibling node is active |
paused | Adapter was intentionally stopped by a user |
releasing | Adapter is gracefully stopping and releasing its lease |
port_exhaustion | Adapter cannot start due to a port conflict on the host |
inactive | Adapter has no lease and is not participating |
Example query (PromQL):
# Count of active adapters
sum(adapter_lifecycle_state{state="active"})
# Alert: any adapter stuck in port_exhaustion
adapter_lifecycle_state{state="port_exhaustion"} == 1
adapter_operational_status
Type: Gauge (state-set pattern)
Reports the business health of each adapter. Only meaningful when lifecycle state is active.
Tags:
| Tag | Description |
|---|---|
adapter_id | The adapter's logical ID |
display_name | Human-readable name |
status | The status this gauge represents |
Status values:
| Value | Meaning |
|---|---|
healthy | All connectors and pipeline routes are operational |
degraded | Partial failure, e.g. one of two connectors is down |
unhealthy | Total failure or critical internal error |
na | Not applicable (adapter is inactive or on standby) |
Example query (PromQL):
# Any adapter in unhealthy state
adapter_operational_status{status="unhealthy"} == 1
# Any active adapter that is degraded
adapter_operational_status{status="degraded"} == 1
and ignoring(status) adapter_lifecycle_state{state="active"} == 1
Plugin Metrics
Plugins can emit their own metrics, which appear alongside the platform metrics in the same scrape endpoint. Plugin metrics are tagged automatically with adapter ID and plugin key.
The specific metric names emitted by each plugin (for example, FIX session message counters or Kafka consumer lag) are documented in each plugin's own reference material.
Dashboards & Alerting
Pre-built Grafana dashboard definitions and recommended alerting rules are being developed. They will be added to this page when available.
In the meantime, the PromQL examples above cover the most critical alert conditions:
| What to alert on | Query |
|---|---|
| Any adapter unhealthy | adapter_operational_status{status="unhealthy"} == 1 |
| Any adapter degraded | adapter_operational_status{status="degraded"} == 1 |
| Port exhaustion | adapter_lifecycle_state{state="port_exhaustion"} == 1 |
| All enabled adapters lost | conncentric_enabled_adapters_count == 0 |
| Adapter stuck provisioning | adapter_lifecycle_state{state="provisioning"} == 1 for > 2 minutes |