Performance Benchmarks
This document provides benchmark results and testing methodology for OpenGSLB performance evaluation.
Benchmark Environment
Reference Hardware
Component |
Specification |
|---|---|
CPU |
4-core Intel Xeon @ 2.5 GHz |
Memory |
8 GB DDR4 |
Disk |
NVMe SSD |
Network |
10 Gbps |
OS |
Ubuntu 22.04 LTS |
Go Version |
1.21 |
Test Configuration
# Overwatch configuration for benchmarks
mode: overwatch
dns:
listen_address: "0.0.0.0:53"
default_ttl: 30
dnssec:
enabled: true
overwatch:
validation:
enabled: false # Disabled for pure DNS benchmarks
logging:
level: warn # Reduced logging for benchmarks
DNS Query Performance
Query Throughput
Scenario |
QPS (median) |
QPS (p99) |
Latency (p50) |
Latency (p99) |
|---|---|---|---|---|
A record, no DNSSEC |
75,000 |
70,000 |
0.3 ms |
1.2 ms |
A record, with DNSSEC |
45,000 |
40,000 |
0.5 ms |
2.0 ms |
Geolocation routing |
40,000 |
35,000 |
0.6 ms |
2.5 ms |
Latency-based routing |
50,000 |
45,000 |
0.5 ms |
2.0 ms |
Query Throughput by Backend Count
Backends |
QPS (A record) |
QPS (A + DNSSEC) |
|---|---|---|
10 |
75,000 |
45,000 |
50 |
72,000 |
43,000 |
100 |
68,000 |
40,000 |
500 |
60,000 |
35,000 |
1,000 |
50,000 |
30,000 |
Latency Distribution
Percentile |
No DNSSEC |
With DNSSEC |
|---|---|---|
p50 |
0.3 ms |
0.5 ms |
p90 |
0.8 ms |
1.5 ms |
p95 |
1.0 ms |
1.8 ms |
p99 |
1.2 ms |
2.0 ms |
p99.9 |
2.0 ms |
4.0 ms |
Resource Usage
Memory Usage
Backends |
Memory (base) |
Memory (with GeoIP) |
|---|---|---|
0 |
50 MB |
55 MB |
100 |
55 MB |
60 MB |
500 |
70 MB |
75 MB |
1,000 |
90 MB |
95 MB |
5,000 |
150 MB |
155 MB |
CPU Usage
At 10,000 QPS steady state:
Configuration |
CPU Usage |
|---|---|
Round-robin |
15% |
Weighted |
16% |
Failover |
15% |
Geolocation |
25% |
Latency-based |
20% |
With DNSSEC |
+30% |
With validation |
+10% |
Gossip Performance
Message Throughput
Agents |
Messages/sec |
Bandwidth |
|---|---|---|
10 |
10 |
1 KB/s |
50 |
50 |
5 KB/s |
100 |
100 |
10 KB/s |
500 |
500 |
50 KB/s |
1,000 |
1,000 |
100 KB/s |
Convergence Time
Time for health status change to propagate:
Change |
Time to Propagate |
|---|---|
Single backend unhealthy |
< 1 second |
Single backend healthy |
< 1 second |
Bulk update (10 backends) |
< 2 seconds |
Network partition recovery |
< 5 seconds |
Routing Algorithm Performance
Geolocation Routing
Operation |
Time |
|---|---|
GeoIP database lookup |
0.1 ms |
Custom CIDR lookup |
0.01 ms |
Region matching |
0.01 ms |
Total routing decision |
0.15 ms |
Latency-Based Routing
Operation |
Time |
|---|---|
EMA calculation |
0.001 ms |
Server selection |
0.01 ms |
Total routing decision |
0.05 ms |
Health Check Performance
Overwatch Validation
Backends |
Checks/minute |
CPU Impact |
|---|---|---|
10 |
20 |
< 1% |
50 |
100 |
2% |
100 |
200 |
5% |
500 |
1,000 |
15% |
Agent Health Checks
Backends per Agent |
Checks/minute |
CPU Impact |
|---|---|---|
1 |
12 |
< 1% |
5 |
60 |
1% |
20 |
240 |
3% |
Scaling Benchmarks
Horizontal Scaling (Multiple Overwatches)
Overwatches |
Combined QPS |
Notes |
|---|---|---|
1 |
75,000 |
Single node |
2 |
150,000 |
Linear scaling |
3 |
220,000 |
Near-linear |
5 |
350,000 |
Slight overhead |
Backend Scaling
Backends |
Registration Time |
Memory Overhead |
|---|---|---|
100 |
< 1 second |
5 MB |
1,000 |
< 5 seconds |
40 MB |
10,000 |
< 30 seconds |
400 MB |
Running Your Own Benchmarks
DNS Query Benchmark
Using dnsperf:
# Install dnsperf
apt-get install dnsperf
# Create query file
echo "myapp.gslb.example.com A" > queries.txt
# Run benchmark
dnsperf -s 127.0.0.1 -d queries.txt -l 60 -c 100 -Q 10000
Using dnsbench:
# Install dnsbench
go install github.com/onur/dnsbench@latest
# Run benchmark
dnsbench -s 127.0.0.1:53 -c 100 -n 100000 myapp.gslb.example.com
Custom Benchmark Script
#!/bin/bash
# benchmark.sh
OVERWATCH="127.0.0.1"
DOMAIN="myapp.gslb.example.com"
DURATION=60
CONCURRENCY=100
echo "=== DNS Benchmark ==="
echo "Target: ${OVERWATCH}"
echo "Domain: ${DOMAIN}"
echo "Duration: ${DURATION}s"
echo "Concurrency: ${CONCURRENCY}"
echo ""
# Warmup
echo "Warming up..."
for i in {1..1000}; do
dig @${OVERWATCH} ${DOMAIN} +short > /dev/null
done
# Benchmark
echo "Running benchmark..."
dnsperf -s ${OVERWATCH} -d <(yes "${DOMAIN} A" | head -100000) \
-l ${DURATION} -c ${CONCURRENCY} -Q 50000
echo "=== Benchmark Complete ==="
Stress Test
#!/bin/bash
# stress-test.sh
OVERWATCH="127.0.0.1"
DOMAIN="myapp.gslb.example.com"
# Gradually increase load
for qps in 10000 20000 30000 40000 50000 60000 70000 80000; do
echo "Testing at ${qps} QPS..."
dnsperf -s ${OVERWATCH} -d <(yes "${DOMAIN} A" | head -100000) \
-l 30 -c 100 -Q ${qps} 2>&1 | tail -5
sleep 10
done
Monitoring During Benchmark
# Terminal 1: Watch metrics
watch -n1 'curl -s http://localhost:9091/metrics | grep -E "(queries_total|query_duration)"'
# Terminal 2: Watch resources
htop -p $(pgrep opengslb)
# Terminal 3: Run benchmark
./benchmark.sh
Performance Tuning
Recommended Settings for High Throughput
# High-performance configuration
logging:
level: warn # Reduce logging overhead
metrics:
enabled: true # Keep for monitoring
dns:
default_ttl: 30 # Balance freshness vs cache efficiency
overwatch:
validation:
check_interval: 60s # Less frequent for high backend counts
System Tuning
# Increase file descriptors
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
# Network tuning
sysctl -w net.core.somaxconn=65535
sysctl -w net.core.netdev_max_backlog=65535
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
# For Overwatch in systemd
# LimitNOFILE=65536
Go Runtime Tuning
# In systemd service
Environment="GOMAXPROCS=4"
Environment="GOGC=100" # Default, adjust based on memory/CPU tradeoff
Comparison with Alternatives
Solution |
QPS (typical) |
Latency (p99) |
Notes |
|---|---|---|---|
OpenGSLB |
50,000+ |
< 2 ms |
With DNSSEC |
PowerDNS |
100,000+ |
< 1 ms |
No GSLB logic |
CoreDNS |
80,000+ |
< 1 ms |
Plugin dependent |
AWS Route 53 |
N/A |
~50 ms |
Managed service |
NS1 |
N/A |
~10 ms |
Managed service |
Note: Comparisons are approximate and depend heavily on configuration.
Benchmark Caveats
Real-world performance varies: Actual results depend on:
Hardware specifications
Network conditions
Query patterns
Configuration options
DNSSEC overhead: Cryptographic operations add latency
Geolocation overhead: Database lookups add latency
Validation overhead: External health checks consume resources
Test methodology: Results vary with test tools and methodology