Agent Deployment Runbook

This runbook provides step-by-step instructions for deploying OpenGSLB agents on application servers.

Overview

OpenGSLB agents run alongside your applications to:

  • Monitor local backend health

  • Report health status to Overwatch nodes via gossip

  • Send predictive signals when resource thresholds are exceeded

  • Support multiple backends per agent

Prerequisites

System Requirements

Resource

Minimum

Recommended

CPU

1 core

2 cores

Memory

64 MB

128 MB

Disk

50 MB

100 MB

Network

Outbound to Overwatch gossip port (7946)

Network Requirements

Direction

Port

Protocol

Purpose

Outbound

7946

TCP/UDP

Gossip to Overwatch nodes

Outbound

8080

TCP

API calls to Overwatch (optional)

Inbound

9100

TCP

Metrics endpoint (optional)

Information Needed

Before starting, gather the following from your Overwatch administrator:

  • Gossip encryption key (32-byte base64 string)

  • Service token for your application(s)

  • Overwatch node addresses and gossip ports

  • Region identifier for this deployment

Installation Methods

Method 2: Docker Installation

docker run -d \
  --name opengslb-agent \
  --restart unless-stopped \
  --network host \
  -v /etc/opengslb/agent.yaml:/etc/opengslb/config.yaml:ro \
  -v /var/lib/opengslb:/var/lib/opengslb \
  ghcr.io/loganrossus/opengslb:latest \
  --config=/etc/opengslb/config.yaml

Method 3: Build from Source

# Clone repository
git clone https://github.com/loganrossus/OpenGSLB.git
cd OpenGSLB

# Build
make build

# Install
sudo mv opengslb /usr/local/bin/

First Start and TOFU Certificate Generation

On first start, the agent will:

  1. Generate a self-signed certificate and private key

  2. Connect to Overwatch nodes via gossip

  3. Present the service token for initial authentication

  4. Overwatch pins the certificate fingerprint (Trust On First Use)

Verify Certificate Generation

# Check certificate was created
ls -la /var/lib/opengslb/
# Should show: agent.crt, agent.key

# View certificate details
openssl x509 -in /var/lib/opengslb/agent.crt -noout -text

Verify Registration with Overwatch

# Using opengslb-cli
opengslb-cli servers --api http://overwatch-1.internal:8080

# Or using curl
curl http://overwatch-1.internal:8080/api/v1/overwatch/backends | jq .

Expected output should show your backend registered:

{
  "backends": [
    {
      "service": "myapp",
      "address": "127.0.0.1",
      "port": 8080,
      "agent_id": "agent-abc123",
      "region": "us-east",
      "effective_status": "healthy",
      "agent_healthy": true
    }
  ]
}

Multi-Backend Configuration

A single agent can monitor multiple backends:

agent:
  backends:
    - service: webapp
      address: 127.0.0.1
      port: 8080
      weight: 100
      health_check:
        type: http
        path: /health
        interval: 5s
        timeout: 2s
        failure_threshold: 3
        success_threshold: 2

    - service: api
      address: 127.0.0.1
      port: 9000
      weight: 100
      health_check:
        type: http
        path: /api/health
        interval: 5s
        timeout: 2s
        failure_threshold: 3
        success_threshold: 2

    - service: grpc
      address: 127.0.0.1
      port: 50051
      weight: 100
      health_check:
        type: tcp
        interval: 10s
        timeout: 3s
        failure_threshold: 2
        success_threshold: 1

Health Check Configuration

HTTP Health Check

health_check:
  type: http
  path: /health           # Required: health endpoint path
  interval: 5s            # How often to check
  timeout: 2s             # Timeout for each check
  failure_threshold: 3    # Mark unhealthy after N failures
  success_threshold: 2    # Mark healthy after N successes
  host: "custom-host"     # Optional: Host header override

TCP Health Check

health_check:
  type: tcp
  interval: 10s
  timeout: 3s
  failure_threshold: 2
  success_threshold: 1

Predictive Health Tuning

Predictive health sends early warning signals before failures occur:

predictive:
  enabled: true
  check_interval: 10s    # How often to check metrics

  cpu:
    threshold: 85        # Start bleeding at 85% CPU
    bleed_duration: 30s  # Gradually reduce traffic over 30s

  memory:
    threshold: 90        # Start bleeding at 90% memory
    bleed_duration: 30s

  error_rate:
    threshold: 10        # Errors per minute
    window: 60s          # Measurement window
    bleed_duration: 60s

Tuning Guidelines:

Scenario

CPU Threshold

Memory Threshold

Error Rate

CPU-bound app

75%

90%

5

Memory-bound app

90%

80%

5

High-traffic API

80%

85%

10

Background worker

90%

90%

3

Log Rotation Setup

Using logrotate

sudo tee /etc/logrotate.d/opengslb << 'EOF'
/var/log/opengslb/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0640 opengslb opengslb
    postrotate
        systemctl reload opengslb-agent 2>/dev/null || true
    endscript
}
EOF

Verification Steps

1. Check Agent is Running

sudo systemctl status opengslb-agent

2. Check Gossip Connectivity

# Test connectivity to Overwatch
nc -zv overwatch-1.internal 7946
nc -zv overwatch-2.internal 7946

3. Check Metrics Endpoint

curl http://localhost:9100/metrics | grep opengslb

Expected metrics:

opengslb_agent_backends_registered 1
opengslb_agent_heartbeats_sent_total 150
opengslb_agent_heartbeat_failures_total 0

4. Check Backend Registration

# Query Overwatch API
curl http://overwatch-1.internal:8080/api/v1/overwatch/backends?service=myapp | jq .

Troubleshooting

Agent Not Starting

  1. Check configuration syntax:

    opengslb --config=/etc/opengslb/agent.yaml --validate
    
  2. Check file permissions:

    ls -la /etc/opengslb/agent.yaml
    # Should be: -rw-r----- root opengslb
    
  3. Check logs:

    journalctl -u opengslb-agent -n 50 --no-pager
    

Agent Not Registering

  1. Verify gossip connectivity:

    nc -zv overwatch-1.internal 7946
    
  2. Check encryption key matches:

    • Agent and Overwatch must use identical gossip encryption keys

    • Check for extra whitespace or encoding issues

  3. Check service token:

    • Token must match agent_tokens in Overwatch config

    • Tokens are case-sensitive

  4. Check certificate issues:

    # View agent certificate
    openssl x509 -in /var/lib/opengslb/agent.crt -noout -text
    

Backend Marked Unhealthy

  1. Check backend is running:

    curl http://localhost:8080/health
    
  2. Check health check configuration:

    • Verify path, port, and timeout settings

    • Ensure health endpoint returns 2xx status

  3. Check agent logs for health check failures:

    journalctl -u opengslb-agent | grep "health check"
    

Metrics Not Available

  1. Check metrics endpoint is enabled:

    metrics:
      enabled: true
      address: "127.0.0.1:9100"
    
  2. Check firewall rules:

    sudo iptables -L -n | grep 9100
    

Configuration Reference

Environment Variables

Variable

Description

Default

OPENGSLB_CONFIG

Config file path

/etc/opengslb/config.yaml

GOMAXPROCS

Max CPU cores

All available

Configuration Options

See Configuration Reference for complete configuration options.

Security Considerations

  • Store gossip encryption key securely (not in version control)

  • Use strong service tokens (minimum 32 characters)

  • Restrict metrics endpoint to localhost or monitoring network

  • Set appropriate file permissions (640 for config, 700 for data dir)

  • Use systemd security hardening options