Demo 2: Agent-Overwatch Architecture
This demo showcases OpenGSLB’s distributed architecture where agents run alongside applications and can PROACTIVELY signal health issues.
What You’ll Learn
Agent-Overwatch architecture
Proactive health signaling (drain mode)
Zero-downtime traffic management
Gossip protocol communication
The Key Insight
Traditional GSLB |
Agent-based GSLB |
|---|---|
External health check fails |
Agent knows problem is coming |
DNS removes backend |
Gossips “I’m unhealthy” |
Some requests already failed |
DNS removes backend first |
Reactive |
Zero failed requests |
Architecture
+-----------------------------------------------------------------------------+
| webapp1 (10.20.0.21) webapp2 (10.20.0.22) webapp3 (10.20.0.23) |
| +-----------------+ +-----------------+ +-----------------+ |
| | nginx + agent | | nginx + agent | | nginx + agent | |
| +--------+--------+ +--------+--------+ +--------+--------+ |
| +----------------------+----------------------+ |
| | gossip (:7946) |
| v |
| overwatch (10.20.0.10) |
| +-----------------------+ |
| | DNS :53 | API :8080 | |
| +-----------------------+ |
| ^ |
| | dig queries |
| client (10.20.0.100) |
| +-- SSH port 2222 |
+-----------------------------------------------------------------------------+
Quick Start
1. Build the Binary
# From the repository root
CGO_ENABLED=0 GOOS=linux go build -o demos/demo-2-agent-overwatch/bin/opengslb ./cmd/opengslb
2. Start the Demo
cd demos/demo-2-agent-overwatch
docker-compose up -d --build
# SSH into client (password: demo)
ssh -p 2222 root@localhost
# Run the guided demo
./demo.sh
Demo Scenarios
Scenario 1: Traditional Reactive Failure
# Stop nginx - external health check eventually catches it
docker exec webapp2 nginx -s stop
# Wait ~10s, then check DNS - webapp2 removed
dig @10.20.0.10 app.demo.local +short
# Restart nginx
docker exec webapp2 nginx
Scenario 2: Proactive Drain (The Key Feature)
This is what makes agent-based GSLB special:
# Trigger drain - agent reports unhealthy, nginx STILL RUNNING
./scripts/drain.sh webapp2 on
# Prove nginx is still healthy
curl http://10.20.0.22/health # Returns 200!
# But it's out of DNS rotation
dig @10.20.0.10 app.demo.local +short # No 10.20.0.22!
# Recover
./scripts/drain.sh webapp2 off
Important
Notice that the health check still passes, but the server is removed from DNS. This is proactive draining - the agent signals “I’m about to have problems” before any actual failure occurs.
Why Agents?
Traditional GSLB relies on external health checks. The problem:
External check runs every N seconds
Server fails between checks
Requests hit failed server until next check
Result: Failed requests during the gap
Agent-based architecture:
Agent runs ON the server with inside knowledge
Agent sees: CPU spike, memory pressure, error rates, maintenance mode
Agent signals “I’m about to fail” BEFORE external check would catch it
Result: Zero failed requests
The Drain File
Each agent’s startup script watches for /tmp/drain. When this file exists:
Agent stops running (stops sending gossip)
Overwatch removes backend from DNS (no gossip = unhealthy)
nginx is still running and serving traffic
Perfect for maintenance drains
This simulates any “inside knowledge” the agent might have:
High CPU/memory (predictive)
Increasing error rates (trending)
Upcoming maintenance (planned)
Application-specific signals
Components
Overwatch (10.20.0.10)
Receives gossip from agents on port 7946
Serves DNS queries on port 53
Exposes API on port 8080
Exposes metrics on port 9090
Webapp Containers (10.20.0.21-23)
Run nginx serving HTTP on port 80
Run OpenGSLB agent as sidecar
Agent monitors localhost:80 and gossips to Overwatch
Agent respects /tmp/drain for proactive health signaling
Client (10.20.0.100)
SSH access on host port 2222
Has dig, curl, jq for testing
Contains demo.sh guided walkthrough
Commands Cheat Sheet
Action |
Command |
|---|---|
SSH to client |
|
Query DNS |
|
Check health API |
|
Enable drain |
|
Disable drain |
|
View agent logs |
|
View overwatch logs |
|
Reset demo |
|
Troubleshooting
Agent not connecting to Overwatch
# Check agent logs
docker logs webapp1
# Verify gossip port is open
docker exec overwatch netstat -ln | grep 7946
DNS not returning backends
# Check if agents registered
curl http://10.20.0.10:8080/api/v1/health/servers | jq
# View overwatch logs
docker logs overwatch
Drain not working
# Verify drain file exists
docker exec webapp2 ls -la /tmp/drain
# Check agent process
docker exec webapp2 pgrep -la opengslb
Cleanup
docker-compose down -v
Next Steps
Continue to Demo 3: Latency-Based Routing to learn about automatic performance optimization.