Load Balancing Explained

Introduction
When your application starts receiving more traffic than a single server can handle, you have two choices: get a bigger server (vertical scaling) or add more servers (horizontal scaling). Load balancing is what makes horizontal scaling possible.
A load balancer distributes incoming network traffic across multiple servers, ensuring no single server bears too much load. It's one of the most critical pieces of infrastructure for any production application — from small web apps to systems serving millions of users.
This guide covers everything you need to know about load balancing: how it works, the algorithms behind it, the different types, and how to implement it with real-world tools.
What You'll Learn
✅ Understand what load balancing is and why it's essential
✅ Master load balancing algorithms (Round Robin, Least Connections, IP Hash, and more)
✅ Distinguish between Layer 4 and Layer 7 load balancing
✅ Implement health checks and failover for high availability
✅ Configure session persistence and SSL termination
✅ Set up load balancing with Nginx, HAProxy, and cloud providers
✅ Learn patterns like Active-Active, Active-Passive, and Global Server Load Balancing
What is Load Balancing?
Load balancing is the process of distributing incoming network traffic across multiple backend servers (also called a server pool or server farm) to ensure:
- No single server is overwhelmed with too many requests
- Application availability remains high even if a server fails
- Response times stay consistently fast for all users
Without a load balancer, all traffic goes to a single server:
1000 requests/sec → [Single Server] → ❌ Overloaded, slow, crashesWith a load balancer:
1000 requests/sec → [Load Balancer] → [Server 1] ~333 req/s ✅
→ [Server 2] ~333 req/s ✅
→ [Server 3] ~333 req/s ✅Why Use Load Balancing?
Load balancing solves several critical problems:
| Benefit | Without Load Balancer | With Load Balancer |
|---|---|---|
| Availability | Single point of failure | Survives server failures |
| Scalability | Limited to one server | Add servers as traffic grows |
| Performance | Degrades under load | Consistent response times |
| Maintenance | Downtime for updates | Rolling updates with zero downtime |
| Efficiency | One server may be underused or overloaded | Even distribution of work |
Load Balancing Algorithms
The algorithm determines how the load balancer decides which server should handle each incoming request. Different algorithms suit different scenarios.
1. Round Robin
The simplest algorithm. Requests are distributed sequentially across servers in order.
Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (cycle repeats)
Request 5 → Server 2
Request 6 → Server 3Nginx configuration:
upstream backend {
server 192.168.1.101;
server 192.168.1.102;
server 192.168.1.103;
}
server {
location / {
proxy_pass http://backend;
}
}Best for: Servers with identical specifications and stateless applications.
Drawback: Doesn't account for server load — a slow-processing request on Server 1 doesn't prevent more requests from being sent to it.
2. Weighted Round Robin
Like Round Robin but assigns more requests to more powerful servers.
Server 1 (weight=5): receives 5 out of every 8 requests
Server 2 (weight=2): receives 2 out of every 8 requests
Server 3 (weight=1): receives 1 out of every 8 requestsNginx configuration:
upstream backend {
server 192.168.1.101 weight=5; # 8-core, 32GB RAM
server 192.168.1.102 weight=2; # 4-core, 16GB RAM
server 192.168.1.103 weight=1; # 2-core, 8GB RAM
}Best for: Environments with servers of different capacities.
3. Least Connections
Sends requests to the server with the fewest active connections.
The load balancer picks Server 2 because it has the fewest active connections.
Nginx configuration:
upstream backend {
least_conn;
server 192.168.1.101;
server 192.168.1.102;
server 192.168.1.103;
}Best for: Applications where requests have varying processing times (e.g., some requests take 50ms, others take 5 seconds).
4. Weighted Least Connections
Combines Least Connections with server weights. The server with the lowest ratio of (active connections / weight) gets the next request.
Server 1: 10 connections, weight=5 → ratio = 2.0
Server 2: 3 connections, weight=2 → ratio = 1.5 ✅ (selected)
Server 3: 4 connections, weight=1 → ratio = 4.0Best for: Mixed server capacities with variable request durations.
5. IP Hash
Uses the client's IP address to determine which server receives the request. The same client IP always goes to the same server.
Client 203.0.113.10 → hash → Server 2 (always)
Client 198.51.100.5 → hash → Server 1 (always)
Client 192.0.2.100 → hash → Server 3 (always)Nginx configuration:
upstream backend {
ip_hash;
server 192.168.1.101;
server 192.168.1.102;
server 192.168.1.103;
}Best for: Applications that need basic session persistence without cookies or application-level session management.
Drawback: If a server goes down, all clients hashed to that server must be reassigned, breaking their sessions.
6. Least Response Time
Routes to the server with the fastest response time and fewest active connections. Requires the load balancer to actively measure response times.
Best for: Performance-critical applications where you want users routed to the fastest available server.
7. Random
Picks a random server for each request. Statistically distributes evenly over time.
Best for: Large server pools where simplicity is preferred and statistical distribution is sufficient.
Algorithm Comparison
| Algorithm | Considers Server Load | Session Persistence | Complexity | Best Use Case |
|---|---|---|---|---|
| Round Robin | ❌ | ❌ | Low | Equal servers, stateless apps |
| Weighted Round Robin | ❌ (manual weights) | ❌ | Low | Mixed server capacities |
| Least Connections | ✅ | ❌ | Medium | Variable request durations |
| IP Hash | ❌ | ✅ | Low | Basic session affinity |
| Least Response Time | ✅ | ❌ | High | Performance-critical apps |
| Random | ❌ | ❌ | Low | Large pools, simplicity |
Layer 4 vs Layer 7 Load Balancing
Load balancers operate at different layers of the OSI model. The two most common are Layer 4 (Transport) and Layer 7 (Application).
Layer 4 Load Balancing (Transport Layer)
Operates at the TCP/UDP level. Makes routing decisions based on:
- Source IP and port
- Destination IP and port
- Protocol (TCP or UDP)
The load balancer does not inspect the content of the request (no HTTP headers, URLs, or cookies).
How it works:
- Client establishes a TCP connection to the load balancer
- Load balancer selects a backend server (using chosen algorithm)
- Load balancer forwards raw TCP/UDP packets to the selected server
- All packets from the same connection go to the same server
Advantages:
- Very fast (no packet inspection)
- Low overhead and latency
- Protocol-agnostic (works with HTTP, SMTP, FTP, databases, etc.)
Disadvantages:
- Cannot route based on URL, headers, or content
- Cannot modify requests or responses
- Limited health check capabilities
Layer 7 Load Balancing (Application Layer)
Operates at the HTTP/HTTPS level. Makes routing decisions based on:
- URL path (e.g.,
/api/*vs/static/*) - HTTP headers (e.g.,
Host,User-Agent,Cookie) - HTTP method (GET, POST, etc.)
- Query parameters
- Request body content
Content-based routing example with Nginx:
upstream api_servers {
server 192.168.1.101;
server 192.168.1.102;
}
upstream static_servers {
server 192.168.1.201;
server 192.168.1.202;
}
server {
listen 80;
location /api/ {
proxy_pass http://api_servers;
}
location /static/ {
proxy_pass http://static_servers;
}
location /images/ {
proxy_pass http://static_servers;
}
}Advantages:
- Content-aware routing decisions
- Can modify requests/responses (add headers, rewrite URLs)
- Advanced health checks (HTTP status codes, response body)
- SSL termination
- Caching, compression, and rate limiting
Disadvantages:
- Higher latency (must parse HTTP)
- More resource-intensive
- HTTP/HTTPS only (not suitable for arbitrary TCP protocols)
Comparison Table
| Feature | Layer 4 | Layer 7 |
|---|---|---|
| OSI Layer | Transport (TCP/UDP) | Application (HTTP/HTTPS) |
| Routing Based On | IP, port, protocol | URL, headers, cookies, content |
| Performance | Faster (no inspection) | Slower (parses HTTP) |
| Content Awareness | ❌ | ✅ |
| SSL Termination | Pass-through only | ✅ Full termination |
| URL Routing | ❌ | ✅ |
| Header Manipulation | ❌ | ✅ |
| Caching | ❌ | ✅ |
| Protocols | Any TCP/UDP | HTTP/HTTPS |
| Use Case | High-throughput, database LB | Web apps, APIs, microservices |
When to Use Each
Choose Layer 4 when:
- You need maximum performance with minimal latency
- You're load balancing non-HTTP protocols (databases, SMTP, gaming servers)
- You don't need content-based routing
Choose Layer 7 when:
- You need to route based on URL paths, headers, or cookies
- You want SSL termination at the load balancer
- You need advanced features like caching, compression, or rate limiting
- You're building microservices with different backends for different paths
Health Checks and Failover
A load balancer must know which servers are healthy and able to handle requests. Health checks are periodic probes sent to backend servers to verify they're alive and functioning correctly.
Types of Health Checks
1. TCP Health Check (Layer 4)
Simply checks if the server accepts TCP connections on the specified port.
Load Balancer → TCP SYN → Server:8080
Server:8080 → TCP SYN-ACK → ✅ Healthy2. HTTP Health Check (Layer 7)
Sends an HTTP request to a health endpoint and checks the response status code.
Load Balancer → GET /health → Server:8080
Server:8080 → 200 OK → ✅ Healthy
Server:8080 → 503 Service Unavailable → ❌ UnhealthyCommon health endpoint implementation:
// Express.js health check endpoint
app.get('/health', async (req, res) => {
try {
// Check database connection
await db.query('SELECT 1');
// Check Redis connection
await redis.ping();
res.status(200).json({
status: 'healthy',
uptime: process.uptime(),
timestamp: new Date().toISOString(),
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error.message,
});
}
});// Spring Boot Actuator (auto-configured health endpoint)
// GET /actuator/health → { "status": "UP" }
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
@Override
public Health health() {
if (isDatabaseReachable()) {
return Health.up()
.withDetail("database", "PostgreSQL")
.build();
}
return Health.down()
.withDetail("error", "Cannot reach database")
.build();
}
}3. Script-based Health Check
Runs a custom script that performs complex checks (disk space, memory, application logic).
Nginx Health Check Configuration
upstream backend {
server 192.168.1.101;
server 192.168.1.102;
server 192.168.1.103;
# Passive health checks (Nginx OSS)
# Mark server as down after 3 failures, retry after 30 seconds
}
server {
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_500 http_502 http_503;
proxy_connect_timeout 5s;
proxy_read_timeout 10s;
}
}HAProxy Health Check Configuration
backend web_servers
balance roundrobin
option httpchk GET /health
http-check expect status 200
server srv1 192.168.1.101:8080 check inter 5s fall 3 rise 2
server srv2 192.168.1.102:8080 check inter 5s fall 3 rise 2
server srv3 192.168.1.103:8080 check inter 5s fall 3 rise 2Parameters explained:
inter 5s: Check every 5 secondsfall 3: Mark unhealthy after 3 consecutive failuresrise 2: Mark healthy again after 2 consecutive successes
Failover Strategies
When a server fails a health check, the load balancer must handle it:
Graceful degradation: When servers fail, remaining servers absorb the extra load:
Normal: Server A (33%) | Server B (33%) | Server C (33%)
Server C ❌: Server A (50%) | Server B (50%) | Server C (down)
Server B ❌: Server A (100%) | Server B (down) | Server C (down)Session Persistence (Sticky Sessions)
By default, a load balancer may route consecutive requests from the same user to different servers. This is a problem when the application stores session data in server memory.
The Problem
Request 1: User logs in → Server A (session stored in Server A memory)
Request 2: User loads dashboard → Server B (no session found → ❌ redirected to login)Solutions
1. Sticky Sessions (Cookie-based)
The load balancer inserts a cookie to pin the user to a specific server.
upstream backend {
server 192.168.1.101;
server 192.168.1.102;
# Nginx Plus (commercial) sticky cookie
sticky cookie srv_id expires=1h domain=.example.com path=/;
}HAProxy sticky sessions:
backend web_servers
balance roundrobin
cookie SERVERID insert indirect nocache
server srv1 192.168.1.101:8080 cookie s1 check
server srv2 192.168.1.102:8080 cookie s2 check2. IP Hash (as discussed above)
Route by client IP — same IP always goes to same server.
3. Externalized Sessions (Recommended)
Instead of relying on sticky sessions, store sessions externally so any server can handle any request:
// Express.js with Redis session store
import session from 'express-session';
import RedisStore from 'connect-redis';
import { createClient } from 'redis';
const redisClient = createClient({ url: 'redis://redis-host:6379' });
await redisClient.connect();
app.use(session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: { secure: true, maxAge: 86400000 }, // 24 hours
}));Why externalized sessions are better:
- Any server can handle any request (true statelessness)
- Server failures don't lose user sessions
- Easy to scale horizontally
- No sticky session complexity
Session Persistence Comparison
| Approach | Server Failure Impact | Scalability | Complexity |
|---|---|---|---|
| Sticky Sessions (Cookie) | Sessions lost when server dies | Limited | Low |
| IP Hash | Sessions lost when server dies | Limited | Low |
| External Store (Redis) | No impact — sessions survive | Excellent | Medium |
| JWT Tokens (Stateless) | No impact — no server state | Excellent | Medium |
SSL/TLS Termination
SSL termination means the load balancer handles the encryption/decryption of HTTPS traffic, so backend servers receive plain HTTP.
Why Terminate SSL at the Load Balancer?
Benefits:
- Reduced server load: Encryption/decryption is CPU-intensive. Offloading it to the load balancer frees backend servers for application logic.
- Centralized certificate management: Manage SSL certificates in one place instead of every server.
- Simplified backend: Servers don't need SSL configuration.
Nginx SSL termination:
server {
listen 443 ssl;
server_name example.com;
ssl_certificate /etc/ssl/certs/example.com.crt;
ssl_certificate_key /etc/ssl/private/example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
location / {
proxy_pass http://backend; # Plain HTTP to backend
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
}
}SSL Pass-through vs Termination vs Re-encryption
| Mode | Description | Backend Sees | Use Case |
|---|---|---|---|
| Termination | LB decrypts, sends HTTP to backend | Plain HTTP | Most web applications |
| Pass-through | LB forwards encrypted traffic as-is | HTTPS (must handle SSL) | End-to-end encryption required |
| Re-encryption | LB decrypts, then re-encrypts to backend | HTTPS (re-encrypted) | Compliance requirements |
Popular Load Balancers
Software Load Balancers
1. Nginx
The most popular web server and reverse proxy, also widely used as a load balancer.
# Complete Nginx load balancer configuration
upstream api_backend {
least_conn;
server 10.0.1.101:8080 weight=3;
server 10.0.1.102:8080 weight=2;
server 10.0.1.103:8080 weight=1;
server 10.0.1.104:8080 backup; # Only used when others are down
}
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/ssl/certs/api.example.com.crt;
ssl_certificate_key /etc/ssl/private/api.example.com.key;
location / {
proxy_pass http://api_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
proxy_send_timeout 30s;
# Retry on failure
proxy_next_upstream error timeout http_502 http_503;
proxy_next_upstream_tries 3;
}
}Strengths: Lightweight, high performance, rich ecosystem, excellent documentation.
2. HAProxy
Purpose-built, high-performance TCP/HTTP load balancer.
# HAProxy configuration
global
log stdout format raw local0
maxconn 4096
defaults
mode http
timeout connect 5s
timeout client 30s
timeout server 30s
option httplog
frontend http_front
bind *:80
bind *:443 ssl crt /etc/ssl/certs/example.pem
redirect scheme https if !{ ssl_fc }
# Route based on URL path
acl is_api path_beg /api
acl is_static path_beg /static /images /css /js
use_backend api_servers if is_api
use_backend static_servers if is_static
default_backend web_servers
backend web_servers
balance roundrobin
option httpchk GET /health
http-check expect status 200
server web1 10.0.1.101:8080 check inter 5s fall 3 rise 2
server web2 10.0.1.102:8080 check inter 5s fall 3 rise 2
backend api_servers
balance leastconn
option httpchk GET /api/health
server api1 10.0.2.101:3000 check inter 5s fall 3 rise 2
server api2 10.0.2.102:3000 check inter 5s fall 3 rise 2
backend static_servers
balance roundrobin
server static1 10.0.3.101:80 check
server static2 10.0.3.102:80 checkStrengths: Built for load balancing, excellent stats dashboard, very low latency, advanced health checks, used by GitHub, Stack Overflow, and Reddit.
3. Traefik
Modern, cloud-native reverse proxy and load balancer with automatic service discovery.
# docker-compose.yml with Traefik
services:
traefik:
image: traefik:v3.0
command:
- "--providers.docker=true"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--certificatesresolvers.letsencrypt.acme.email=admin@example.com"
- "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- letsencrypt:/letsencrypt
api:
image: my-api:latest
deploy:
replicas: 3
labels:
- "traefik.http.routers.api.rule=Host(`api.example.com`)"
- "traefik.http.routers.api.tls.certresolver=letsencrypt"
- "traefik.http.services.api.loadbalancer.server.port=3000"
volumes:
letsencrypt:Strengths: Automatic service discovery with Docker/Kubernetes, built-in Let's Encrypt, dynamic configuration without restarts.
Software Load Balancer Comparison
| Feature | Nginx | HAProxy | Traefik |
|---|---|---|---|
| Primary Use | Web server + LB | Dedicated LB | Cloud-native LB |
| Layer 4 | ✅ | ✅ | ✅ |
| Layer 7 | ✅ | ✅ | ✅ |
| Configuration | File-based | File-based | File, Docker labels, API |
| Auto-discovery | ❌ (manual) | ❌ (manual) | ✅ (Docker, K8s) |
| Let's Encrypt | Via certbot | Via ACME | ✅ Built-in |
| Dashboard | Nginx Plus only | ✅ Stats page | ✅ Web UI |
| Hot Reload | ✅ nginx -s reload | ✅ haproxy -sf | ✅ Automatic |
| Used By | ~34% of all websites | GitHub, Reddit | Docker-heavy orgs |
Cloud Load Balancers
Cloud providers offer managed load balancers that require zero server management:
AWS:
- Application Load Balancer (ALB): Layer 7, content-based routing, WebSocket support
- Network Load Balancer (NLB): Layer 4, ultra-low latency, millions of requests per second
- Classic Load Balancer (CLB): Legacy, supports both L4 and L7
Google Cloud:
- Cloud Load Balancing: Global, anycast-based, auto-scaling, supports HTTP(S), TCP, UDP
Azure:
- Azure Load Balancer: Layer 4
- Application Gateway: Layer 7 with WAF (Web Application Firewall)
When to use cloud vs software load balancers:
| Scenario | Recommendation |
|---|---|
| Running on a single cloud provider | Use cloud LB (managed, auto-scaling) |
| Multi-cloud or hybrid deployment | Software LB (Nginx, HAProxy) |
| Kubernetes environment | Cloud LB + Ingress Controller (Nginx, Traefik) |
| Budget-conscious, self-hosted | Nginx or HAProxy on a VPS |
| Need fine-grained control | Software LB with custom configuration |
Load Balancing in Microservices
In a microservices architecture, load balancing becomes even more important as services communicate with each other constantly.
Service-to-Service Load Balancing
Client-Side vs Server-Side Load Balancing
Server-side load balancing (traditional): A dedicated load balancer sits between services.
Service A → Load Balancer → Service B (instance 1)
→ Service B (instance 2)
→ Service B (instance 3)Client-side load balancing: The calling service chooses which instance to call, using a service registry.
Service A → [Service Registry] → knows all Service B instances
→ directly calls Service B (instance 2)Client-side example (conceptual):
class ClientSideLoadBalancer {
private instances: string[];
private currentIndex = 0;
constructor(private serviceRegistry: ServiceRegistry) {
this.instances = [];
}
async getNextInstance(serviceName: string): Promise<string> {
// Refresh instances from service registry
this.instances = await this.serviceRegistry.getInstances(serviceName);
if (this.instances.length === 0) {
throw new Error(`No instances available for ${serviceName}`);
}
// Round Robin
const instance = this.instances[this.currentIndex % this.instances.length];
this.currentIndex++;
return instance;
}
}Service Discovery
For load balancing in microservices, services need to find each other. This is called service discovery.
DNS-based discovery:
# Kubernetes Service (built-in DNS-based discovery)
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
# Other services can call: http://user-service/api/users
# Kubernetes DNS resolves to healthy pod IPsRegistry-based discovery (Consul, Eureka, etcd):
Service B starts → registers with Consul: "user-service at 10.0.1.5:3000"
Service A needs Service B → asks Consul: "where is user-service?"
Consul responds: ["10.0.1.5:3000", "10.0.1.6:3000", "10.0.1.7:3000"]
Service A picks one → calls 10.0.1.6:3000DNS-Based Load Balancing
DNS can perform basic load balancing by returning different IP addresses for the same domain name.
How It Works
Client → DNS query: "api.example.com" → DNS server
DNS server → Response: [203.0.113.1, 203.0.113.2, 203.0.113.3]
Client → Connects to 203.0.113.1 (typically the first IP)Next query (from different or same client, after TTL expires):
DNS server → Response: [203.0.113.2, 203.0.113.3, 203.0.113.1] (rotated)DNS Round Robin Configuration
; BIND DNS zone file
api.example.com. 300 IN A 203.0.113.1
api.example.com. 300 IN A 203.0.113.2
api.example.com. 300 IN A 203.0.113.3Limitations of DNS Load Balancing
| Limitation | Impact |
|---|---|
| DNS caching | Clients and resolvers cache DNS responses. Changes take time to propagate (based on TTL). |
| No health checks | DNS doesn't know if a server is down. Clients may connect to dead servers. |
| No session awareness | No control over which server handles which client. |
| Uneven distribution | Some clients cache longer, leading to uneven load. |
When DNS load balancing is useful:
- Distributing traffic across geographic regions (GSLB)
- As a first layer before dedicated load balancers
- Simple setups where health checks aren't critical
Global Server Load Balancing (GSLB)
GSLB distributes traffic across servers in multiple geographic locations to minimize latency and provide disaster recovery.
How GSLB Works
- User makes a DNS query for
app.example.com - GSLB-enabled DNS considers:
- Geographic proximity of the user
- Health of each data center
- Current load at each location
- Latency to each data center
- Returns the IP of the closest/best data center
GSLB Routing Strategies
| Strategy | Description | Use Case |
|---|---|---|
| Geolocation | Route based on client's country/region | Compliance, content localization |
| Latency-based | Route to lowest-latency data center | Performance optimization |
| Failover | Primary/secondary data centers | Disaster recovery |
| Weighted | Distribute percentage of traffic to each region | Gradual migrations, canary deployments |
Cloud GSLB Services
- AWS Route 53: Latency-based, geolocation, failover, weighted routing
- Google Cloud DNS: Geolocation routing with health checks
- Azure Traffic Manager: Performance, weighted, priority, geographic routing
- Cloudflare Load Balancing: Anycast-based with geo-steering
Auto-Scaling with Load Balancers
Load balancers work together with auto-scaling to handle traffic spikes automatically.
How Auto-Scaling Works
Scaling triggers:
- CPU usage > 70% for 5 minutes
- Memory usage > 80%
- Request count > 1000/sec per server
- Response time > 500ms average
Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80AWS Auto Scaling Group
Load Balancer (ALB)
↓
Auto Scaling Group
├── Min: 2 instances
├── Desired: 3 instances
├── Max: 10 instances
└── Scaling Policy:
├── Scale Up: CPU > 70% for 5 min → add 2 instances
└── Scale Down: CPU < 30% for 15 min → remove 1 instanceBest practices for auto-scaling:
- Set conservative scale-up thresholds (react quickly to traffic spikes)
- Set aggressive scale-down thresholds (scale down slowly to avoid flapping)
- Always have a minimum number of healthy instances
- Use health checks to ensure new instances are ready before receiving traffic
Load Balancing Patterns
Active-Active
All servers are actively handling traffic simultaneously.
Load Balancer
├── Server A (Active) ← handles requests
├── Server B (Active) ← handles requests
└── Server C (Active) ← handles requestsPros: Maximum resource utilization, highest throughput.
Cons: All servers need identical configuration, more complex to maintain.
Active-Passive (Failover)
Only the active server handles traffic. The passive server is a standby that takes over when the active server fails.
Normal:
Load Balancer → Server A (Active) ← handles all requests
Server B (Passive/Standby) ← idle, monitoring
Failover:
Load Balancer → Server A (Down) ✗
Server B (Now Active) ← takes overPros: Simple, guaranteed failover capacity.
Cons: Passive server sits idle (wasted resources).
Blue-Green Deployment with Load Balancer
Use load balancing to enable zero-downtime deployments:
Step 1: Blue (v1) is live
Load Balancer → Blue Servers (v1) ✅
Step 2: Deploy Green (v2), test it
Load Balancer → Blue Servers (v1) ✅
Green Servers (v2) [testing]
Step 3: Switch traffic to Green
Load Balancer → Green Servers (v2) ✅
Blue Servers (v1) [standby for rollback]Canary Deployment
Route a small percentage of traffic to the new version:
upstream backend {
server 10.0.1.101:8080 weight=9; # v1 (90% traffic)
server 10.0.1.102:8080 weight=1; # v2 (10% traffic — canary)
}Gradually increase the canary weight as confidence grows:
Phase 1: v1 = 90%, v2 = 10% → monitor metrics
Phase 2: v1 = 70%, v2 = 30% → monitor metrics
Phase 3: v1 = 50%, v2 = 50% → monitor metrics
Phase 4: v1 = 0%, v2 = 100% → rollout completeMonitoring and Metrics
A load balancer produces valuable metrics for understanding your system's health and performance.
Key Metrics to Monitor
| Metric | What It Tells You | Alert Threshold |
|---|---|---|
| Request rate | Traffic volume (req/s) | Sudden spikes or drops |
| Error rate | Percentage of 4xx/5xx responses | > 1% of total traffic |
| Latency (p50, p95, p99) | Response time distribution | p99 > 1 second |
| Active connections | Current concurrent connections | Approaching server limits |
| Backend health | Number of healthy vs unhealthy servers | Any server down |
| Bandwidth | Data transfer in/out | Approaching network limits |
| Connection queue | Requests waiting for a server | Queue > 0 sustained |
Nginx Status Monitoring
# Enable Nginx stub status
server {
listen 8080;
location /nginx_status {
stub_status on;
allow 10.0.0.0/8; # Allow internal network only
deny all;
}
}# Output
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106HAProxy Stats Dashboard
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if TRUE
stats auth admin:passwordHAProxy provides a built-in web dashboard showing real-time metrics for every backend server: connections, request rates, response times, health status, and error rates.
Best Practices
Configuration Best Practices
1. Always configure health checks
# Don't just check if port is open — check application health
location /health {
proxy_pass http://backend/health;
proxy_connect_timeout 2s;
proxy_read_timeout 2s;
}2. Set appropriate timeouts
proxy_connect_timeout 5s; # Time to establish connection to backend
proxy_read_timeout 30s; # Time to wait for response from backend
proxy_send_timeout 10s; # Time to send request to backend3. Pass client information to backends
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;4. Configure connection limits
upstream backend {
server 192.168.1.101 max_conns=100;
server 192.168.1.102 max_conns=100;
queue 50 timeout=10s; # Queue requests when all servers are at max
}Architecture Best Practices
1. Eliminate the load balancer as a single point of failure
Use redundant load balancers with failover:
┌── Load Balancer 1 (Active)
Virtual IP ──┤
└── Load Balancer 2 (Standby)
│
┌───────┼───────┐
Server Server Server2. Use the right algorithm for your workload
- Stateless API: Round Robin or Least Connections
- WebSocket connections: IP Hash or Least Connections
- Mixed server sizes: Weighted Round Robin
- Performance-critical: Least Response Time
3. Externalize session state
Don't rely on sticky sessions. Use Redis, Memcached, or JWT tokens for session management.
4. Enable connection keep-alive
upstream backend {
server 192.168.1.101;
keepalive 32; # Keep 32 idle connections to each backend
}
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection ""; # Enable keepalive to backend
}5. Plan for graceful shutdown
When removing a server for maintenance:
1. Mark server as "draining" (stop new connections)
2. Wait for existing connections to complete
3. Remove server from pool
4. Perform maintenance
5. Re-add server to pool
6. Monitor health checks confirm it's healthyCommon Pitfalls
| Pitfall | Problem | Solution |
|---|---|---|
| No health checks | Traffic sent to dead servers | Always configure health endpoints |
| Single load balancer | LB becomes single point of failure | Deploy redundant LBs with failover |
| Sticky sessions everywhere | Limits scalability, complex failover | Use external session stores (Redis) |
| Ignoring timeouts | Slow backends block all connections | Set connect, read, and send timeouts |
| No monitoring | Problems discovered by users, not engineers | Monitor request rate, errors, latency |
| Misconfigured SSL | Mixed content, security warnings | Terminate SSL at LB, redirect HTTP to HTTPS |
| No connection limits | One backend overwhelmed by burst | Set max_conns per server |
| No graceful shutdown | Active requests dropped during deployment | Drain connections before removing servers |
Summary and Key Takeaways
✅ Load balancing distributes traffic across multiple servers for scalability, availability, and performance
✅ Algorithms like Round Robin, Least Connections, and IP Hash determine how traffic is routed
✅ Layer 4 load balancing is faster but content-unaware; Layer 7 enables routing based on URLs, headers, and cookies
✅ Health checks are essential — always configure them to detect and remove unhealthy servers
✅ Externalize session state (Redis, JWT) instead of relying on sticky sessions
✅ SSL termination at the load balancer simplifies certificate management and reduces backend load
✅ Nginx and HAProxy are the most popular software load balancers; cloud providers offer managed alternatives
✅ Auto-scaling works with load balancers to handle traffic spikes automatically
✅ Monitor request rates, error rates, latency, and backend health continuously
✅ Deploy redundant load balancers to avoid a single point of failure
What's Next?
Now that you understand load balancing, explore these related topics:
- Reverse Proxy Explained — Learn how reverse proxies work and how they relate to load balancers
- HTTP Protocol Complete Guide — Deep dive into the protocol that load balancers route
- Client-Server Architecture Explained — Understand the foundation that load balancing builds upon
Load balancing is one of the most impactful infrastructure decisions you'll make. Start with a simple Round Robin setup, add health checks, externalize your sessions, and scale from there. You don't need to implement everything on day one — but you do need to plan for it.
📬 Subscribe to Newsletter
Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.
We respect your privacy. Unsubscribe at any time.
💬 Comments
Sign in to leave a comment
We'll never post without your permission.