OpenGSLB Metrics Reference

OpenGSLB exposes Prometheus metrics for monitoring and observability. This document describes all available metrics and how to configure them.

Configuration

Enable the metrics endpoint in your configuration:

metrics:
  enabled: true
  address: ":9090"  # Default port

Metrics are served at http://<address>/metrics in Prometheus text format.

A health check endpoint is also available at http://<address>/health.

Metrics Reference

DNS Metrics

`opengslb_dns_queries_total`

Type: Counter
Labels: domain, type, status

Total number of DNS queries received.

Label	Description
`domain`	The queried domain name
`type`	DNS query type (A, AAAA, etc.)
`status`	Response status: `success`, `nxdomain`, `servfail`

Example:

opengslb_dns_queries_total{domain="app.example.com",type="A",status="success"} 1542
opengslb_dns_queries_total{domain="app.example.com",type="AAAA",status="success"} 523
opengslb_dns_queries_total{domain="unknown.com",type="A",status="nxdomain"} 12

`opengslb_dns_query_duration_seconds`

Type: Histogram
Labels: domain, status

DNS query processing duration in seconds.

Buckets: 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1

Example:

opengslb_dns_query_duration_seconds_bucket{domain="app.example.com",status="success",le="0.001"} 1200
opengslb_dns_query_duration_seconds_sum{domain="app.example.com",status="success"} 0.892
opengslb_dns_query_duration_seconds_count{domain="app.example.com",status="success"} 1542

Health Check Metrics

`opengslb_health_check_results_total`

Type: Counter
Labels: region, server, result

Total number of health check results.

Label	Description
`region`	Region name
`server`	Server address and port (e.g., `10.0.1.10:80`)
`result`	Check result: `healthy`, `unhealthy`

Example:

opengslb_health_check_results_total{region="us-east-1",server="10.0.1.10:80",result="healthy"} 4521
opengslb_health_check_results_total{region="us-east-1",server="10.0.1.10:80",result="unhealthy"} 3

`opengslb_health_check_duration_seconds`

Type: Histogram
Labels: region, server

Health check duration in seconds.

Buckets: 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5

Example:

opengslb_health_check_duration_seconds_bucket{region="us-east-1",server="10.0.1.10:80",le="0.1"} 4500
opengslb_health_check_duration_seconds_sum{region="us-east-1",server="10.0.1.10:80"} 135.6
opengslb_health_check_duration_seconds_count{region="us-east-1",server="10.0.1.10:80"} 4524

`opengslb_healthy_servers`

Type: Gauge
Labels: region

Current number of healthy servers per region.

Example:

opengslb_healthy_servers{region="us-east-1"} 3
opengslb_healthy_servers{region="us-west-2"} 2

Routing Metrics

`opengslb_routing_decisions_total`

Type: Counter
Labels: domain, algorithm, server

Total number of routing decisions made.

Label	Description
`domain`	The domain being routed
`algorithm`	Routing algorithm used: `round-robin`, `weighted`, `failover`
`server`	Selected server address and port

Example:

opengslb_routing_decisions_total{domain="app.example.com",algorithm="round-robin",server="10.0.1.10:80"} 512
opengslb_routing_decisions_total{domain="app.example.com",algorithm="round-robin",server="10.0.1.11:80"} 510
opengslb_routing_decisions_total{domain="critical.example.com",algorithm="failover",server="10.0.2.10:80"} 1000

Configuration Metrics

`opengslb_config_reloads_total`

Type: Counter
Labels: result

Total number of configuration reload attempts.

Label	Description
`result`	Reload result: `success`, `failure`

Example:

opengslb_config_reloads_total{result="success"} 5
opengslb_config_reloads_total{result="failure"} 1

Use Cases:

Track reload activity
Alert on failed reloads
Correlate reloads with behavior changes

`opengslb_config_reload_timestamp_seconds`

Type: Gauge

Unix timestamp of the last successful configuration reload.

Example:

opengslb_config_reload_timestamp_seconds 1701504615

Use Cases:

Verify reload was applied
Track time since last reload
Correlate with deployment events

Application Metrics

`opengslb_app_info`

Type: Gauge
Labels: version

Application version information. Always set to 1.

Example:

opengslb_app_info{version="1.0.0"} 1

`opengslb_config_load_timestamp_seconds`

Type: Gauge

Unix timestamp of the initial configuration load at startup.

Example:

opengslb_config_load_timestamp_seconds 1701504000

`opengslb_configured_domains`

Type: Gauge

Number of configured domains.

Example:

opengslb_configured_domains 5

`opengslb_configured_servers`

Type: Gauge

Total number of configured servers across all regions.

Example:

opengslb_configured_servers 12

Prometheus Configuration

Add OpenGSLB to your Prometheus scrape configuration:

scrape_configs:
  - job_name: 'opengslb'
    static_configs:
      - targets: ['opengslb-host:9090']
    scrape_interval: 15s

Example Queries

Query Rate

rate(opengslb_dns_queries_total[5m])

Query Latency (p99)

histogram_quantile(0.99, rate(opengslb_dns_query_duration_seconds_bucket[5m]))

Error Rate

sum(rate(opengslb_dns_queries_total{status!="success"}[5m])) 
/ 
sum(rate(opengslb_dns_queries_total[5m]))

Healthy Server Ratio

opengslb_healthy_servers / opengslb_configured_servers

Health Check Failure Rate

rate(opengslb_health_check_results_total{result="unhealthy"}[5m])

Configuration Reload Success Rate

sum(rate(opengslb_config_reloads_total{result="success"}[1h]))
/
sum(rate(opengslb_config_reloads_total[1h]))

Time Since Last Reload

time() - opengslb_config_reload_timestamp_seconds

Routing Distribution by Algorithm

sum by (algorithm) (rate(opengslb_routing_decisions_total[5m]))

Failover Events (Traffic to Non-Primary)

# Track when failover routing sends traffic to secondary servers
rate(opengslb_routing_decisions_total{algorithm="failover"}[5m])

Alerting Examples

High Error Rate

- alert: OpenGSLBHighErrorRate
  expr: |
    sum(rate(opengslb_dns_queries_total{status!="success"}[5m])) 
    / 
    sum(rate(opengslb_dns_queries_total[5m])) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "OpenGSLB error rate above 5%"

No Healthy Servers

- alert: OpenGSLBNoHealthyServers
  expr: opengslb_healthy_servers == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "No healthy servers in region {{ $labels.region }}"

High Query Latency

- alert: OpenGSLBHighLatency
  expr: |
    histogram_quantile(0.99, rate(opengslb_dns_query_duration_seconds_bucket[5m])) > 0.01
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "OpenGSLB p99 latency above 10ms"

Configuration Reload Failed

- alert: OpenGSLBConfigReloadFailed
  expr: increase(opengslb_config_reloads_total{result="failure"}[5m]) > 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: "OpenGSLB configuration reload failed"
    description: "A configuration reload attempt failed. Check logs for details."

Failover Active

- alert: OpenGSLBFailoverActive
  expr: |
    opengslb_healthy_servers{region="primary"} == 0 
    and opengslb_healthy_servers{region="secondary"} > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "OpenGSLB failover active - primary region has no healthy servers"

Low Healthy Server Ratio

- alert: OpenGSLBLowHealthyRatio
  expr: |
    opengslb_healthy_servers / opengslb_configured_servers < 0.5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Less than 50% of servers are healthy"

Overwatch Metrics (ADR-015)

These metrics are only available in Overwatch mode.

`opengslb_overwatch_backends_total`

Type: Gauge

Total number of backends in the registry.

Example:

opengslb_overwatch_backends_total 24

`opengslb_overwatch_backends_healthy`

Type: Gauge

Number of backends with healthy effective status.

Example:

opengslb_overwatch_backends_healthy 22

`opengslb_overwatch_agents_registered`

Type: Gauge

Number of unique agents currently registered.

Example:

opengslb_overwatch_agents_registered 8

`opengslb_overwatch_stale_agents`

Type: Gauge

Number of backends marked as stale (no recent heartbeat).

Example:

opengslb_overwatch_stale_agents 2

`opengslb_overwatch_overrides_active`

Type: Gauge

Number of active manual overrides.

Example:

opengslb_overwatch_overrides_active 1

`opengslb_overwatch_validation_total`

Type: Counter Labels: service, result

External validation results.

Label	Description
`service`	Service name
`result`	Validation result: `healthy`, `unhealthy`

Example:

opengslb_overwatch_validation_total{service="web-service",result="healthy"} 450
opengslb_overwatch_validation_total{service="web-service",result="unhealthy"} 12

`opengslb_overwatch_veto_total`

Type: Counter Labels: service, reason

Veto events where Overwatch overrode agent health claims.

Label	Description
`service`	Service name
`reason`	Veto reason: `validation_unhealthy`, `validation_healthy`

Example:

opengslb_overwatch_veto_total{service="web-service",reason="validation_unhealthy"} 5

`opengslb_overwatch_backends_by_authority`

Type: Gauge Labels: authority

Backends grouped by health authority source.

Label	Description
`authority`	Source: `agent`, `override`, `stale`

Example:

opengslb_overwatch_backends_by_authority{authority="agent"} 20
opengslb_overwatch_backends_by_authority{authority="override"} 1
opengslb_overwatch_backends_by_authority{authority="stale"} 3

Gossip Metrics

`opengslb_gossip_messages_received_total`

Type: Counter Labels: type

Total gossip messages received by type.

Example:

opengslb_gossip_messages_received_total{type="heartbeat"} 4521
opengslb_gossip_messages_received_total{type="predictive"} 12

`opengslb_gossip_override_operations_total`

Type: Counter Labels: operation

Override operations via API.

Label	Description
`operation`	Operation type: `set`, `clear`

Example:

opengslb_gossip_override_operations_total{operation="set"} 5
opengslb_gossip_override_operations_total{operation="clear"} 3

`opengslb_gossip_decryption_failures_total`

Type: Counter

Total gossip message decryption failures.

Example:

opengslb_gossip_decryption_failures_total 3

Use Cases:

Monitor for encryption key mismatches
Detect potential security issues with gossip communication

Geolocation Routing Metrics (Sprint 6)

`opengslb_routing_geo_decisions_total`

Type: Counter Labels: domain, country, continent, region

Total geolocation routing decisions by location.

Label	Description
`domain`	The domain being routed
`country`	ISO country code (e.g., “US”, “GB”)
`continent`	Continent code (e.g., “NA”, “EU”)
`region`	Selected region name

Example:

opengslb_routing_geo_decisions_total{domain="app.example.com",country="US",continent="NA",region="us-east-1"} 1542
opengslb_routing_geo_decisions_total{domain="app.example.com",country="GB",continent="EU",region="eu-west-1"} 523

`opengslb_routing_geo_fallback_total`

Type: Counter Labels: domain, reason

Total geolocation routing fallbacks by reason.

Label	Description
`domain`	The domain being routed
`reason`	Fallback reason: `no_client_ip`, `no_resolver`, `lookup_failed`, `no_servers_in_region`, `no_match`

Example:

opengslb_routing_geo_fallback_total{domain="app.example.com",reason="no_servers_in_region"} 12
opengslb_routing_geo_fallback_total{domain="app.example.com",reason="lookup_failed"} 5

`opengslb_routing_geo_custom_hits_total`

Type: Counter Labels: domain, region, cidr

Total custom CIDR mapping matches in geolocation routing.

Label	Description
`domain`	The domain being routed
`region`	Matched region from custom mapping
`cidr`	The matched CIDR range

Example:

opengslb_routing_geo_custom_hits_total{domain="app.example.com",region="us-chicago",cidr="10.1.0.0/16"} 450
opengslb_routing_geo_custom_hits_total{domain="app.example.com",region="us-dallas",cidr="10.2.0.0/16"} 230

Latency Routing Metrics (Sprint 6)

`opengslb_routing_latency_selected_ms`

Type: Gauge Labels: domain, server

Smoothed latency in milliseconds of the selected server for latency-based routing.

Example:

opengslb_routing_latency_selected_ms{domain="perf-critical.example.com",server="10.0.1.10:8080"} 45.5

`opengslb_routing_latency_rejected_total`

Type: Counter Labels: domain, server, reason

Total servers rejected due to latency threshold or insufficient data.

Label	Description
`domain`	The domain being routed
`server`	Server address (host:port)
`reason`	Rejection reason: `above_threshold`, `no_data`

Example:

opengslb_routing_latency_rejected_total{domain="perf-critical.example.com",server="10.0.2.10:8080",reason="above_threshold"} 5
opengslb_routing_latency_rejected_total{domain="perf-critical.example.com",server="10.0.3.10:8080",reason="no_data"} 12

`opengslb_routing_latency_fallback_total`

Type: Counter Labels: domain, reason

Total fallbacks to round-robin when latency data is unavailable.

Label	Description
`domain`	The domain being routed
`reason`	Fallback reason: `no_provider`, `no_latency_data`

Example:

opengslb_routing_latency_fallback_total{domain="perf-critical.example.com",reason="no_latency_data"} 3

`opengslb_backend_smoothed_latency_ms`

Type: Gauge Labels: service, address

Current smoothed (EMA) latency in milliseconds for each backend.

Example:

opengslb_backend_smoothed_latency_ms{service="myapp",address="10.0.1.10:8080"} 45.5
opengslb_backend_smoothed_latency_ms{service="myapp",address="10.0.1.11:8080"} 52.3

`opengslb_backend_latency_samples`

Type: Gauge Labels: service, address

Number of latency samples collected for each backend.

Example:

opengslb_backend_latency_samples{service="myapp",address="10.0.1.10:8080"} 150

Per-Agent Connectivity Metrics (Sprint 6)

`opengslb_agent_connected`

Type: Gauge Labels: agent_id, region

Agent connection status (1=connected, 0=disconnected).

Example:

opengslb_agent_connected{agent_id="agent-1",region="us-east-1"} 1
opengslb_agent_connected{agent_id="agent-2",region="eu-west-1"} 0

`opengslb_agent_heartbeat_age_seconds`

Type: Gauge Labels: agent_id

Seconds since last heartbeat per agent.

Example:

opengslb_agent_heartbeat_age_seconds{agent_id="agent-1"} 5.2
opengslb_agent_heartbeat_age_seconds{agent_id="agent-2"} 45.8

`opengslb_agent_backends_registered_per_agent`

Type: Gauge Labels: agent_id

Number of backends registered by each agent.

Example:

opengslb_agent_backends_registered_per_agent{agent_id="agent-1"} 4
opengslb_agent_backends_registered_per_agent{agent_id="agent-2"} 2

`opengslb_agent_stale_events_total`

Type: Counter Labels: agent_id

Total stale events per agent.

Example:

opengslb_agent_stale_events_total{agent_id="agent-1"} 2

Override Metrics with Service Granularity (Sprint 6)

`opengslb_overrides_active`

Type: Gauge Labels: service

Number of active overrides per service.

Example:

opengslb_overrides_active{service="myapp"} 1
opengslb_overrides_active{service="otherapp"} 0

`opengslb_overrides_changes_total`

Type: Counter Labels: service, action

Total override changes by service and action.

Label	Description
`service`	Service name
`action`	Action type: `set`, `clear`

Example:

opengslb_overrides_changes_total{service="myapp",action="set"} 5
opengslb_overrides_changes_total{service="myapp",action="clear"} 3

Enhanced DNSSEC Metrics (Sprint 6)

`opengslb_dnssec_signatures_total`

Type: Counter Labels: zone

Total DNSSEC signatures generated per zone.

Example:

opengslb_dnssec_signatures_total{zone="gslb.example.com"} 15420

`opengslb_dnssec_key_age_by_zone_seconds`

Type: Gauge Labels: zone, key_tag

Age of DNSSEC signing keys in seconds, per zone and key tag.

Label	Description
`zone`	DNS zone name
`key_tag`	DNSSEC key tag identifier

Example:

opengslb_dnssec_key_age_by_zone_seconds{zone="gslb.example.com",key_tag="12345"} 86400

Overwatch Alerting Examples

No Registered Agents

- alert: OpenGSLBNoAgents
  expr: opengslb_overwatch_agents_registered == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "No agents registered with Overwatch"

High Stale Backend Count

- alert: OpenGSLBHighStaleBackends
  expr: |
    opengslb_overwatch_stale_agents / opengslb_overwatch_backends_total > 0.2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "More than 20% of backends are stale"

Validation Disagreement Rate

- alert: OpenGSLBHighVetoRate
  expr: |
    rate(opengslb_overwatch_veto_total[5m]) > 0.1
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High rate of validation vetoes - agent health claims being overridden"

Sprint 6 Example Queries

Geolocation Traffic Distribution

# Traffic distribution by region
sum by (region) (rate(opengslb_routing_geo_decisions_total[5m]))

# Traffic from custom CIDR mappings
sum by (region, cidr) (rate(opengslb_routing_geo_custom_hits_total[5m]))

# Geolocation fallback rate
sum(rate(opengslb_routing_geo_fallback_total[5m])) / sum(rate(opengslb_routing_geo_decisions_total[5m]))

Latency Routing Analysis

# Average selected latency by domain
avg by (domain) (opengslb_routing_latency_selected_ms)

# Servers frequently rejected due to high latency
topk(5, sum by (server) (rate(opengslb_routing_latency_rejected_total{reason="above_threshold"}[1h])))

# Latency routing fallback rate
sum by (domain) (rate(opengslb_routing_latency_fallback_total[5m])) / sum by (domain) (rate(opengslb_routing_decisions_total{algorithm="latency"}[5m]))

Agent Health Monitoring

# Agents not connected
opengslb_agent_connected == 0

# Agents with stale heartbeats (>30s)
opengslb_agent_heartbeat_age_seconds > 30

# Stale events by agent
rate(opengslb_agent_stale_events_total[1h])

Override Activity

# Current override count by service
opengslb_overrides_active

# Override change rate
sum by (service, action) (rate(opengslb_overrides_changes_total[1h]))

Sprint 6 Alerting Examples

High Geolocation Fallback Rate

- alert: OpenGSLBHighGeoFallbackRate
  expr: |
    sum(rate(opengslb_routing_geo_fallback_total[5m])) /
    sum(rate(opengslb_routing_geo_decisions_total[5m])) > 0.1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Geolocation routing fallback rate above 10%"
    description: "Many geolocation lookups are failing or falling back to default."

Agent Heartbeat Stale

- alert: OpenGSLBAgentStale
  expr: opengslb_agent_heartbeat_age_seconds > 60
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "Agent {{ $labels.agent_id }} heartbeat stale"
    description: "No heartbeat received from agent for over 60 seconds."

High Latency Server Selection

- alert: OpenGSLBHighLatencySelected
  expr: opengslb_routing_latency_selected_ms > 200
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Selected servers have high latency for {{ $labels.domain }}"
    description: "Latency-based routing is selecting servers with >200ms latency."

Gossip Decryption Failures

- alert: OpenGSLBGossipDecryptionFailures
  expr: increase(opengslb_gossip_decryption_failures_total[5m]) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Gossip message decryption failures detected"
    description: "This may indicate encryption key mismatches between nodes."

Metric Cardinality

Be aware of metric cardinality when configuring monitoring:

Metric	Cardinality Factors
`opengslb_dns_queries_total`	domains × query_types × status
`opengslb_routing_decisions_total`	domains × algorithms × servers
`opengslb_health_check_results_total`	regions × servers × results
`opengslb_routing_geo_decisions_total`	domains × countries × continents × regions
`opengslb_routing_geo_custom_hits_total`	domains × regions × cidrs
`opengslb_routing_latency_rejected_total`	domains × servers × reasons
`opengslb_agent_connected`	agents × regions
`opengslb_overrides_changes_total`	services × actions
`opengslb_dnssec_key_age_by_zone_seconds`	zones × key_tags

For large deployments with many domains or servers, consider:

Aggregating by region instead of individual servers
Using recording rules to pre-aggregate high-cardinality metrics
Limiting label values in Prometheus configuration
The geolocation metrics can grow with country/continent combinations - monitor cardinality

OpenGSLB Metrics Reference

Configuration

Metrics Reference

DNS Metrics

opengslb_dns_queries_total

opengslb_dns_query_duration_seconds

Health Check Metrics

opengslb_health_check_results_total

opengslb_health_check_duration_seconds

opengslb_healthy_servers

Routing Metrics

opengslb_routing_decisions_total

Configuration Metrics

opengslb_config_reloads_total

opengslb_config_reload_timestamp_seconds

Application Metrics

opengslb_app_info

opengslb_config_load_timestamp_seconds

opengslb_configured_domains

opengslb_configured_servers

Prometheus Configuration

Example Queries

Query Rate

Query Latency (p99)

Error Rate

Healthy Server Ratio

Health Check Failure Rate

Configuration Reload Success Rate

Time Since Last Reload

Routing Distribution by Algorithm

Failover Events (Traffic to Non-Primary)

Alerting Examples

High Error Rate

No Healthy Servers

High Query Latency

Configuration Reload Failed

Failover Active

Low Healthy Server Ratio

Overwatch Metrics (ADR-015)

opengslb_overwatch_backends_total

opengslb_overwatch_backends_healthy

opengslb_overwatch_agents_registered

opengslb_overwatch_stale_agents

opengslb_overwatch_overrides_active

opengslb_overwatch_validation_total

opengslb_overwatch_veto_total

opengslb_overwatch_backends_by_authority

Gossip Metrics

opengslb_gossip_messages_received_total

opengslb_gossip_override_operations_total

opengslb_gossip_decryption_failures_total

Geolocation Routing Metrics (Sprint 6)

opengslb_routing_geo_decisions_total

opengslb_routing_geo_fallback_total

opengslb_routing_geo_custom_hits_total

Latency Routing Metrics (Sprint 6)

opengslb_routing_latency_selected_ms

opengslb_routing_latency_rejected_total

opengslb_routing_latency_fallback_total

opengslb_backend_smoothed_latency_ms

opengslb_backend_latency_samples

Per-Agent Connectivity Metrics (Sprint 6)

opengslb_agent_connected

opengslb_agent_heartbeat_age_seconds

opengslb_agent_backends_registered_per_agent

opengslb_agent_stale_events_total

Override Metrics with Service Granularity (Sprint 6)

opengslb_overrides_active

opengslb_overrides_changes_total

Enhanced DNSSEC Metrics (Sprint 6)

opengslb_dnssec_signatures_total

opengslb_dnssec_key_age_by_zone_seconds

Overwatch Alerting Examples

No Registered Agents

High Stale Backend Count

Validation Disagreement Rate

Sprint 6 Example Queries

Geolocation Traffic Distribution

Latency Routing Analysis

Agent Health Monitoring

`opengslb_dns_queries_total`

`opengslb_dns_query_duration_seconds`

`opengslb_health_check_results_total`

`opengslb_health_check_duration_seconds`

`opengslb_healthy_servers`

`opengslb_routing_decisions_total`

`opengslb_config_reloads_total`

`opengslb_config_reload_timestamp_seconds`

`opengslb_app_info`

`opengslb_config_load_timestamp_seconds`

`opengslb_configured_domains`

`opengslb_configured_servers`

`opengslb_overwatch_backends_total`

`opengslb_overwatch_backends_healthy`

`opengslb_overwatch_agents_registered`

`opengslb_overwatch_stale_agents`

`opengslb_overwatch_overrides_active`

`opengslb_overwatch_validation_total`

`opengslb_overwatch_veto_total`

`opengslb_overwatch_backends_by_authority`

`opengslb_gossip_messages_received_total`

`opengslb_gossip_override_operations_total`

`opengslb_gossip_decryption_failures_total`

`opengslb_routing_geo_decisions_total`

`opengslb_routing_geo_fallback_total`

`opengslb_routing_geo_custom_hits_total`

`opengslb_routing_latency_selected_ms`

`opengslb_routing_latency_rejected_total`

`opengslb_routing_latency_fallback_total`

`opengslb_backend_smoothed_latency_ms`

`opengslb_backend_latency_samples`

`opengslb_agent_connected`

`opengslb_agent_heartbeat_age_seconds`

`opengslb_agent_backends_registered_per_agent`

`opengslb_agent_stale_events_total`

`opengslb_overrides_active`

`opengslb_overrides_changes_total`

`opengslb_dnssec_signatures_total`

`opengslb_dnssec_key_age_by_zone_seconds`