Load Balancing Explained

Introduction

When your application starts receiving more traffic than a single server can handle, you have two choices: get a bigger server (vertical scaling) or add more servers (horizontal scaling). Load balancing is what makes horizontal scaling possible.

A load balancer distributes incoming network traffic across multiple servers, ensuring no single server bears too much load. It's one of the most critical pieces of infrastructure for any production application — from small web apps to systems serving millions of users.

This guide covers everything you need to know about load balancing: how it works, the algorithms behind it, the different types, and how to implement it with real-world tools.

What You'll Learn

✅ Understand what load balancing is and why it's essential
✅ Master load balancing algorithms (Round Robin, Least Connections, IP Hash, and more)
✅ Distinguish between Layer 4 and Layer 7 load balancing
✅ Implement health checks and failover for high availability
✅ Configure session persistence and SSL termination
✅ Set up load balancing with Nginx, HAProxy, and cloud providers
✅ Learn patterns like Active-Active, Active-Passive, and Global Server Load Balancing

What is Load Balancing?

Load balancing is the process of distributing incoming network traffic across multiple backend servers (also called a server pool or server farm) to ensure:

No single server is overwhelmed with too many requests
Application availability remains high even if a server fails
Response times stay consistently fast for all users

Without a load balancer, all traffic goes to a single server:

1000 requests/sec → [Single Server] → ❌ Overloaded, slow, crashes

With a load balancer:

1000 requests/sec → [Load Balancer] → [Server 1] ~333 req/s ✅
                                    → [Server 2] ~333 req/s ✅
                                    → [Server 3] ~333 req/s ✅

Why Use Load Balancing?

Load balancing solves several critical problems:

Benefit	Without Load Balancer	With Load Balancer
Availability	Single point of failure	Survives server failures
Scalability	Limited to one server	Add servers as traffic grows
Performance	Degrades under load	Consistent response times
Maintenance	Downtime for updates	Rolling updates with zero downtime
Efficiency	One server may be underused or overloaded	Even distribution of work

Load Balancing Algorithms

The algorithm determines how the load balancer decides which server should handle each incoming request. Different algorithms suit different scenarios.

1. Round Robin

The simplest algorithm. Requests are distributed sequentially across servers in order.

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1  (cycle repeats)
Request 5 → Server 2
Request 6 → Server 3

Nginx configuration:

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}
 
server {
    location / {
        proxy_pass http://backend;
    }
}

Best for: Servers with identical specifications and stateless applications.

Drawback: Doesn't account for server load — a slow-processing request on Server 1 doesn't prevent more requests from being sent to it.

2. Weighted Round Robin

Like Round Robin but assigns more requests to more powerful servers.

Server 1 (weight=5):  receives 5 out of every 8 requests
Server 2 (weight=2):  receives 2 out of every 8 requests
Server 3 (weight=1):  receives 1 out of every 8 requests

Nginx configuration:

upstream backend {
    server 192.168.1.101 weight=5;  # 8-core, 32GB RAM
    server 192.168.1.102 weight=2;  # 4-core, 16GB RAM
    server 192.168.1.103 weight=1;  # 2-core, 8GB RAM
}

Best for: Environments with servers of different capacities.

3. Least Connections

Sends requests to the server with the fewest active connections.

The load balancer picks Server 2 because it has the fewest active connections.

Nginx configuration:

upstream backend {
    least_conn;
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}

Best for: Applications where requests have varying processing times (e.g., some requests take 50ms, others take 5 seconds).

4. Weighted Least Connections

Combines Least Connections with server weights. The server with the lowest ratio of (active connections / weight) gets the next request.

Server 1: 10 connections, weight=5 → ratio = 2.0
Server 2:  3 connections, weight=2 → ratio = 1.5 ✅ (selected)
Server 3:  4 connections, weight=1 → ratio = 4.0

Best for: Mixed server capacities with variable request durations.

5. IP Hash

Uses the client's IP address to determine which server receives the request. The same client IP always goes to the same server.

Client 203.0.113.10 → hash → Server 2 (always)
Client 198.51.100.5 → hash → Server 1 (always)
Client 192.0.2.100  → hash → Server 3 (always)

Nginx configuration:

upstream backend {
    ip_hash;
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}

Best for: Applications that need basic session persistence without cookies or application-level session management.

Drawback: If a server goes down, all clients hashed to that server must be reassigned, breaking their sessions.

6. Least Response Time

Routes to the server with the fastest response time and fewest active connections. Requires the load balancer to actively measure response times.

Best for: Performance-critical applications where you want users routed to the fastest available server.

7. Random

Picks a random server for each request. Statistically distributes evenly over time.

Best for: Large server pools where simplicity is preferred and statistical distribution is sufficient.

Algorithm Comparison

Algorithm	Considers Server Load	Session Persistence	Complexity	Best Use Case
Round Robin	❌	❌	Low	Equal servers, stateless apps
Weighted Round Robin	❌ (manual weights)	❌	Low	Mixed server capacities
Least Connections	✅	❌	Medium	Variable request durations
IP Hash	❌	✅	Low	Basic session affinity
Least Response Time	✅	❌	High	Performance-critical apps
Random	❌	❌	Low	Large pools, simplicity

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model. The two most common are Layer 4 (Transport) and Layer 7 (Application).

Layer 4 Load Balancing (Transport Layer)

Operates at the TCP/UDP level. Makes routing decisions based on:

Source IP and port
Destination IP and port
Protocol (TCP or UDP)

The load balancer does not inspect the content of the request (no HTTP headers, URLs, or cookies).

How it works:

Client establishes a TCP connection to the load balancer
Load balancer selects a backend server (using chosen algorithm)
Load balancer forwards raw TCP/UDP packets to the selected server
All packets from the same connection go to the same server

Advantages:

Very fast (no packet inspection)
Low overhead and latency
Protocol-agnostic (works with HTTP, SMTP, FTP, databases, etc.)

Disadvantages:

Cannot route based on URL, headers, or content
Cannot modify requests or responses
Limited health check capabilities

Layer 7 Load Balancing (Application Layer)

Operates at the HTTP/HTTPS level. Makes routing decisions based on:

URL path (e.g., /api/* vs /static/*)
HTTP headers (e.g., Host, User-Agent, Cookie)
HTTP method (GET, POST, etc.)
Query parameters
Request body content

Content-based routing example with Nginx:

upstream api_servers {
    server 192.168.1.101;
    server 192.168.1.102;
}
 
upstream static_servers {
    server 192.168.1.201;
    server 192.168.1.202;
}
 
server {
    listen 80;
 
    location /api/ {
        proxy_pass http://api_servers;
    }
 
    location /static/ {
        proxy_pass http://static_servers;
    }
 
    location /images/ {
        proxy_pass http://static_servers;
    }
}

Advantages:

Content-aware routing decisions
Can modify requests/responses (add headers, rewrite URLs)
Advanced health checks (HTTP status codes, response body)
SSL termination
Caching, compression, and rate limiting

Disadvantages:

Higher latency (must parse HTTP)
More resource-intensive
HTTP/HTTPS only (not suitable for arbitrary TCP protocols)

Comparison Table

Feature	Layer 4	Layer 7
OSI Layer	Transport (TCP/UDP)	Application (HTTP/HTTPS)
Routing Based On	IP, port, protocol	URL, headers, cookies, content
Performance	Faster (no inspection)	Slower (parses HTTP)
Content Awareness	❌	✅
SSL Termination	Pass-through only	✅ Full termination
URL Routing	❌	✅
Header Manipulation	❌	✅
Caching	❌	✅
Protocols	Any TCP/UDP	HTTP/HTTPS
Use Case	High-throughput, database LB	Web apps, APIs, microservices

When to Use Each

Choose Layer 4 when:

You need maximum performance with minimal latency
You're load balancing non-HTTP protocols (databases, SMTP, gaming servers)
You don't need content-based routing

Choose Layer 7 when:

You need to route based on URL paths, headers, or cookies
You want SSL termination at the load balancer
You need advanced features like caching, compression, or rate limiting
You're building microservices with different backends for different paths

Health Checks and Failover

A load balancer must know which servers are healthy and able to handle requests. Health checks are periodic probes sent to backend servers to verify they're alive and functioning correctly.

Types of Health Checks

1. TCP Health Check (Layer 4)

Simply checks if the server accepts TCP connections on the specified port.

Load Balancer → TCP SYN → Server:8080
Server:8080  → TCP SYN-ACK → ✅ Healthy

2. HTTP Health Check (Layer 7)

Sends an HTTP request to a health endpoint and checks the response status code.

Load Balancer → GET /health → Server:8080
Server:8080  → 200 OK → ✅ Healthy
Server:8080  → 503 Service Unavailable → ❌ Unhealthy

Common health endpoint implementation:

// Express.js health check endpoint
app.get('/health', async (req, res) => {
  try {
    // Check database connection
    await db.query('SELECT 1');
 
    // Check Redis connection
    await redis.ping();
 
    res.status(200).json({
      status: 'healthy',
      uptime: process.uptime(),
      timestamp: new Date().toISOString(),
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error.message,
    });
  }
});

// Spring Boot Actuator (auto-configured health endpoint)
// GET /actuator/health → { "status": "UP" }
 
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
    @Override
    public Health health() {
        if (isDatabaseReachable()) {
            return Health.up()
                .withDetail("database", "PostgreSQL")
                .build();
        }
        return Health.down()
            .withDetail("error", "Cannot reach database")
            .build();
    }
}

3. Script-based Health Check

Runs a custom script that performs complex checks (disk space, memory, application logic).

Nginx Health Check Configuration

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
 
    # Passive health checks (Nginx OSS)
    # Mark server as down after 3 failures, retry after 30 seconds
}
 
server {
    location / {
        proxy_pass http://backend;
        proxy_next_upstream error timeout http_500 http_502 http_503;
        proxy_connect_timeout 5s;
        proxy_read_timeout 10s;
    }
}

HAProxy Health Check Configuration

backend web_servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
 
    server srv1 192.168.1.101:8080 check inter 5s fall 3 rise 2
    server srv2 192.168.1.102:8080 check inter 5s fall 3 rise 2
    server srv3 192.168.1.103:8080 check inter 5s fall 3 rise 2

Parameters explained:

inter 5s: Check every 5 seconds
fall 3: Mark unhealthy after 3 consecutive failures
rise 2: Mark healthy again after 2 consecutive successes

Failover Strategies

When a server fails a health check, the load balancer must handle it:

Graceful degradation: When servers fail, remaining servers absorb the extra load:

Normal:    Server A (33%) | Server B (33%) | Server C (33%)
Server C ❌: Server A (50%) | Server B (50%) | Server C (down)
Server B ❌: Server A (100%) | Server B (down) | Server C (down)

Session Persistence (Sticky Sessions)

By default, a load balancer may route consecutive requests from the same user to different servers. This is a problem when the application stores session data in server memory.

The Problem

Request 1: User logs in → Server A (session stored in Server A memory)
Request 2: User loads dashboard → Server B (no session found → ❌ redirected to login)

Solutions

1. Sticky Sessions (Cookie-based)

The load balancer inserts a cookie to pin the user to a specific server.

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
 
    # Nginx Plus (commercial) sticky cookie
    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

HAProxy sticky sessions:

backend web_servers
    balance roundrobin
    cookie SERVERID insert indirect nocache
 
    server srv1 192.168.1.101:8080 cookie s1 check
    server srv2 192.168.1.102:8080 cookie s2 check

2. IP Hash (as discussed above)

Route by client IP — same IP always goes to same server.

3. Externalized Sessions (Recommended)

Instead of relying on sticky sessions, store sessions externally so any server can handle any request:

// Express.js with Redis session store
import session from 'express-session';
import RedisStore from 'connect-redis';
import { createClient } from 'redis';
 
const redisClient = createClient({ url: 'redis://redis-host:6379' });
await redisClient.connect();
 
app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false,
  cookie: { secure: true, maxAge: 86400000 }, // 24 hours
}));

Why externalized sessions are better:

Any server can handle any request (true statelessness)
Server failures don't lose user sessions
Easy to scale horizontally
No sticky session complexity

Session Persistence Comparison

Approach	Server Failure Impact	Scalability	Complexity
Sticky Sessions (Cookie)	Sessions lost when server dies	Limited	Low
IP Hash	Sessions lost when server dies	Limited	Low
External Store (Redis)	No impact — sessions survive	Excellent	Medium
JWT Tokens (Stateless)	No impact — no server state	Excellent	Medium

SSL/TLS Termination

SSL termination means the load balancer handles the encryption/decryption of HTTPS traffic, so backend servers receive plain HTTP.

Why Terminate SSL at the Load Balancer?

Benefits:

Reduced server load: Encryption/decryption is CPU-intensive. Offloading it to the load balancer frees backend servers for application logic.
Centralized certificate management: Manage SSL certificates in one place instead of every server.
Simplified backend: Servers don't need SSL configuration.

Nginx SSL termination:

server {
    listen 443 ssl;
    server_name example.com;
 
    ssl_certificate     /etc/ssl/certs/example.com.crt;
    ssl_certificate_key /etc/ssl/private/example.com.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;
 
    location / {
        proxy_pass http://backend;  # Plain HTTP to backend
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
    }
}

SSL Pass-through vs Termination vs Re-encryption

Mode	Description	Backend Sees	Use Case
Termination	LB decrypts, sends HTTP to backend	Plain HTTP	Most web applications
Pass-through	LB forwards encrypted traffic as-is	HTTPS (must handle SSL)	End-to-end encryption required
Re-encryption	LB decrypts, then re-encrypts to backend	HTTPS (re-encrypted)	Compliance requirements

Popular Load Balancers

Software Load Balancers

1. Nginx

The most popular web server and reverse proxy, also widely used as a load balancer.

# Complete Nginx load balancer configuration
upstream api_backend {
    least_conn;
    server 10.0.1.101:8080 weight=3;
    server 10.0.1.102:8080 weight=2;
    server 10.0.1.103:8080 weight=1;
    server 10.0.1.104:8080 backup;  # Only used when others are down
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
 
    ssl_certificate     /etc/ssl/certs/api.example.com.crt;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;
 
    location / {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
        proxy_send_timeout 30s;
 
        # Retry on failure
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 3;
    }
}

Strengths: Lightweight, high performance, rich ecosystem, excellent documentation.

2. HAProxy

Purpose-built, high-performance TCP/HTTP load balancer.

# HAProxy configuration
global
    log     stdout format raw local0
    maxconn 4096
 
defaults
    mode    http
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    option  httplog
 
frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/example.pem
    redirect scheme https if !{ ssl_fc }
 
    # Route based on URL path
    acl is_api path_beg /api
    acl is_static path_beg /static /images /css /js
 
    use_backend api_servers if is_api
    use_backend static_servers if is_static
    default_backend web_servers
 
backend web_servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server web1 10.0.1.101:8080 check inter 5s fall 3 rise 2
    server web2 10.0.1.102:8080 check inter 5s fall 3 rise 2
 
backend api_servers
    balance leastconn
    option httpchk GET /api/health
    server api1 10.0.2.101:3000 check inter 5s fall 3 rise 2
    server api2 10.0.2.102:3000 check inter 5s fall 3 rise 2
 
backend static_servers
    balance roundrobin
    server static1 10.0.3.101:80 check
    server static2 10.0.3.102:80 check

Strengths: Built for load balancing, excellent stats dashboard, very low latency, advanced health checks, used by GitHub, Stack Overflow, and Reddit.

3. Traefik

Modern, cloud-native reverse proxy and load balancer with automatic service discovery.

# docker-compose.yml with Traefik
services:
  traefik:
    image: traefik:v3.0
    command:
      - "--providers.docker=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.letsencrypt.acme.email=admin@example.com"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - letsencrypt:/letsencrypt
 
  api:
    image: my-api:latest
    deploy:
      replicas: 3
    labels:
      - "traefik.http.routers.api.rule=Host(`api.example.com`)"
      - "traefik.http.routers.api.tls.certresolver=letsencrypt"
      - "traefik.http.services.api.loadbalancer.server.port=3000"
 
volumes:
  letsencrypt:

Strengths: Automatic service discovery with Docker/Kubernetes, built-in Let's Encrypt, dynamic configuration without restarts.

Software Load Balancer Comparison

Feature	Nginx	HAProxy	Traefik
Primary Use	Web server + LB	Dedicated LB	Cloud-native LB
Layer 4	✅	✅	✅
Layer 7	✅	✅	✅
Configuration	File-based	File-based	File, Docker labels, API
Auto-discovery	❌ (manual)	❌ (manual)	✅ (Docker, K8s)
Let's Encrypt	Via certbot	Via ACME	✅ Built-in
Dashboard	Nginx Plus only	✅ Stats page	✅ Web UI
Hot Reload	✅ `nginx -s reload`	✅ `haproxy -sf`	✅ Automatic
Used By	~34% of all websites	GitHub, Reddit	Docker-heavy orgs

Cloud Load Balancers

Cloud providers offer managed load balancers that require zero server management:

AWS:

Application Load Balancer (ALB): Layer 7, content-based routing, WebSocket support
Network Load Balancer (NLB): Layer 4, ultra-low latency, millions of requests per second
Classic Load Balancer (CLB): Legacy, supports both L4 and L7

Google Cloud:

Cloud Load Balancing: Global, anycast-based, auto-scaling, supports HTTP(S), TCP, UDP

Azure:

Azure Load Balancer: Layer 4
Application Gateway: Layer 7 with WAF (Web Application Firewall)

When to use cloud vs software load balancers:

Scenario	Recommendation
Running on a single cloud provider	Use cloud LB (managed, auto-scaling)
Multi-cloud or hybrid deployment	Software LB (Nginx, HAProxy)
Kubernetes environment	Cloud LB + Ingress Controller (Nginx, Traefik)
Budget-conscious, self-hosted	Nginx or HAProxy on a VPS
Need fine-grained control	Software LB with custom configuration

Load Balancing in Microservices

In a microservices architecture, load balancing becomes even more important as services communicate with each other constantly.

Service-to-Service Load Balancing

Client-Side vs Server-Side Load Balancing

Server-side load balancing (traditional): A dedicated load balancer sits between services.

Service A → Load Balancer → Service B (instance 1)
                          → Service B (instance 2)
                          → Service B (instance 3)

Client-side load balancing: The calling service chooses which instance to call, using a service registry.

Service A → [Service Registry] → knows all Service B instances
          → directly calls Service B (instance 2)

Client-side example (conceptual):

class ClientSideLoadBalancer {
  private instances: string[];
  private currentIndex = 0;
 
  constructor(private serviceRegistry: ServiceRegistry) {
    this.instances = [];
  }
 
  async getNextInstance(serviceName: string): Promise<string> {
    // Refresh instances from service registry
    this.instances = await this.serviceRegistry.getInstances(serviceName);
 
    if (this.instances.length === 0) {
      throw new Error(`No instances available for ${serviceName}`);
    }
 
    // Round Robin
    const instance = this.instances[this.currentIndex % this.instances.length];
    this.currentIndex++;
    return instance;
  }
}

Service Discovery

For load balancing in microservices, services need to find each other. This is called service discovery.

DNS-based discovery:

# Kubernetes Service (built-in DNS-based discovery)
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP
---
# Other services can call: http://user-service/api/users
# Kubernetes DNS resolves to healthy pod IPs

Registry-based discovery (Consul, Eureka, etcd):

Service B starts → registers with Consul: "user-service at 10.0.1.5:3000"
Service A needs Service B → asks Consul: "where is user-service?"
Consul responds: ["10.0.1.5:3000", "10.0.1.6:3000", "10.0.1.7:3000"]
Service A picks one → calls 10.0.1.6:3000

DNS-Based Load Balancing

DNS can perform basic load balancing by returning different IP addresses for the same domain name.

How It Works

Client → DNS query: "api.example.com" → DNS server
DNS server → Response: [203.0.113.1, 203.0.113.2, 203.0.113.3]
Client → Connects to 203.0.113.1 (typically the first IP)

Next query (from different or same client, after TTL expires):

DNS server → Response: [203.0.113.2, 203.0.113.3, 203.0.113.1]  (rotated)

DNS Round Robin Configuration

; BIND DNS zone file
api.example.com.    300    IN    A    203.0.113.1
api.example.com.    300    IN    A    203.0.113.2
api.example.com.    300    IN    A    203.0.113.3

Limitations of DNS Load Balancing

Limitation	Impact
DNS caching	Clients and resolvers cache DNS responses. Changes take time to propagate (based on TTL).
No health checks	DNS doesn't know if a server is down. Clients may connect to dead servers.
No session awareness	No control over which server handles which client.
Uneven distribution	Some clients cache longer, leading to uneven load.

When DNS load balancing is useful:

Distributing traffic across geographic regions (GSLB)
As a first layer before dedicated load balancers
Simple setups where health checks aren't critical

Global Server Load Balancing (GSLB)

GSLB distributes traffic across servers in multiple geographic locations to minimize latency and provide disaster recovery.

How GSLB Works

User makes a DNS query for app.example.com
GSLB-enabled DNS considers:
- Geographic proximity of the user
- Health of each data center
- Current load at each location
- Latency to each data center
Returns the IP of the closest/best data center

GSLB Routing Strategies

Strategy	Description	Use Case
Geolocation	Route based on client's country/region	Compliance, content localization
Latency-based	Route to lowest-latency data center	Performance optimization
Failover	Primary/secondary data centers	Disaster recovery
Weighted	Distribute percentage of traffic to each region	Gradual migrations, canary deployments

Cloud GSLB Services

AWS Route 53: Latency-based, geolocation, failover, weighted routing
Google Cloud DNS: Geolocation routing with health checks
Azure Traffic Manager: Performance, weighted, priority, geographic routing
Cloudflare Load Balancing: Anycast-based with geo-steering

Auto-Scaling with Load Balancers

Load balancers work together with auto-scaling to handle traffic spikes automatically.

How Auto-Scaling Works

Scaling triggers:

CPU usage > 70% for 5 minutes
Memory usage > 80%
Request count > 1000/sec per server
Response time > 500ms average

Kubernetes Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

AWS Auto Scaling Group

Load Balancer (ALB)
  ↓
Auto Scaling Group
  ├── Min: 2 instances
  ├── Desired: 3 instances
  ├── Max: 10 instances
  └── Scaling Policy:
      ├── Scale Up: CPU > 70% for 5 min → add 2 instances
      └── Scale Down: CPU < 30% for 15 min → remove 1 instance

Best practices for auto-scaling:

Set conservative scale-up thresholds (react quickly to traffic spikes)
Set aggressive scale-down thresholds (scale down slowly to avoid flapping)
Always have a minimum number of healthy instances
Use health checks to ensure new instances are ready before receiving traffic

Load Balancing Patterns

Active-Active

All servers are actively handling traffic simultaneously.

Load Balancer
├── Server A (Active) ← handles requests
├── Server B (Active) ← handles requests
└── Server C (Active) ← handles requests

Pros: Maximum resource utilization, highest throughput.

Cons: All servers need identical configuration, more complex to maintain.

Active-Passive (Failover)

Only the active server handles traffic. The passive server is a standby that takes over when the active server fails.

Normal:
Load Balancer → Server A (Active) ← handles all requests
                Server B (Passive/Standby) ← idle, monitoring
 
Failover:
Load Balancer → Server A (Down) ✗
                Server B (Now Active) ← takes over

Pros: Simple, guaranteed failover capacity.

Cons: Passive server sits idle (wasted resources).

Blue-Green Deployment with Load Balancer

Use load balancing to enable zero-downtime deployments:

Step 1: Blue (v1) is live
Load Balancer → Blue Servers (v1) ✅
 
Step 2: Deploy Green (v2), test it
Load Balancer → Blue Servers (v1) ✅
                Green Servers (v2) [testing]
 
Step 3: Switch traffic to Green
Load Balancer → Green Servers (v2) ✅
                Blue Servers (v1) [standby for rollback]

Canary Deployment

Route a small percentage of traffic to the new version:

upstream backend {
    server 10.0.1.101:8080 weight=9;   # v1 (90% traffic)
    server 10.0.1.102:8080 weight=1;   # v2 (10% traffic — canary)
}

Gradually increase the canary weight as confidence grows:

Phase 1:  v1 = 90%, v2 = 10%  → monitor metrics
Phase 2:  v1 = 70%, v2 = 30%  → monitor metrics
Phase 3:  v1 = 50%, v2 = 50%  → monitor metrics
Phase 4:  v1 = 0%,  v2 = 100% → rollout complete

Monitoring and Metrics

A load balancer produces valuable metrics for understanding your system's health and performance.

Key Metrics to Monitor

Metric	What It Tells You	Alert Threshold
Request rate	Traffic volume (req/s)	Sudden spikes or drops
Error rate	Percentage of 4xx/5xx responses	> 1% of total traffic
Latency (p50, p95, p99)	Response time distribution	p99 > 1 second
Active connections	Current concurrent connections	Approaching server limits
Backend health	Number of healthy vs unhealthy servers	Any server down
Bandwidth	Data transfer in/out	Approaching network limits
Connection queue	Requests waiting for a server	Queue > 0 sustained

Nginx Status Monitoring

# Enable Nginx stub status
server {
    listen 8080;
    location /nginx_status {
        stub_status on;
        allow 10.0.0.0/8;    # Allow internal network only
        deny all;
    }
}

# Output
Active connections: 291
server accepts handled requests
 16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

HAProxy Stats Dashboard

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if TRUE
    stats auth admin:password

HAProxy provides a built-in web dashboard showing real-time metrics for every backend server: connections, request rates, response times, health status, and error rates.

Best Practices

Configuration Best Practices

1. Always configure health checks

# Don't just check if port is open — check application health
location /health {
    proxy_pass http://backend/health;
    proxy_connect_timeout 2s;
    proxy_read_timeout 2s;
}

2. Set appropriate timeouts

proxy_connect_timeout 5s;    # Time to establish connection to backend
proxy_read_timeout 30s;      # Time to wait for response from backend
proxy_send_timeout 10s;      # Time to send request to backend

3. Pass client information to backends

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

4. Configure connection limits

upstream backend {
    server 192.168.1.101 max_conns=100;
    server 192.168.1.102 max_conns=100;
    queue 50 timeout=10s;  # Queue requests when all servers are at max
}

Architecture Best Practices

1. Eliminate the load balancer as a single point of failure

Use redundant load balancers with failover:

          ┌── Load Balancer 1 (Active)
Virtual IP ──┤
          └── Load Balancer 2 (Standby)
                    │
            ┌───────┼───────┐
          Server  Server  Server

2. Use the right algorithm for your workload

Stateless API: Round Robin or Least Connections
WebSocket connections: IP Hash or Least Connections
Mixed server sizes: Weighted Round Robin
Performance-critical: Least Response Time

3. Externalize session state

Don't rely on sticky sessions. Use Redis, Memcached, or JWT tokens for session management.

4. Enable connection keep-alive

upstream backend {
    server 192.168.1.101;
    keepalive 32;  # Keep 32 idle connections to each backend
}
 
location / {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";  # Enable keepalive to backend
}

5. Plan for graceful shutdown

When removing a server for maintenance:

1. Mark server as "draining" (stop new connections)
2. Wait for existing connections to complete
3. Remove server from pool
4. Perform maintenance
5. Re-add server to pool
6. Monitor health checks confirm it's healthy

Common Pitfalls

Pitfall	Problem	Solution
No health checks	Traffic sent to dead servers	Always configure health endpoints
Single load balancer	LB becomes single point of failure	Deploy redundant LBs with failover
Sticky sessions everywhere	Limits scalability, complex failover	Use external session stores (Redis)
Ignoring timeouts	Slow backends block all connections	Set connect, read, and send timeouts
No monitoring	Problems discovered by users, not engineers	Monitor request rate, errors, latency
Misconfigured SSL	Mixed content, security warnings	Terminate SSL at LB, redirect HTTP to HTTPS
No connection limits	One backend overwhelmed by burst	Set `max_conns` per server
No graceful shutdown	Active requests dropped during deployment	Drain connections before removing servers

Summary and Key Takeaways

✅ Load balancing distributes traffic across multiple servers for scalability, availability, and performance
✅ Algorithms like Round Robin, Least Connections, and IP Hash determine how traffic is routed
✅ Layer 4 load balancing is faster but content-unaware; Layer 7 enables routing based on URLs, headers, and cookies
✅ Health checks are essential — always configure them to detect and remove unhealthy servers
✅ Externalize session state (Redis, JWT) instead of relying on sticky sessions
✅ SSL termination at the load balancer simplifies certificate management and reduces backend load
✅ Nginx and HAProxy are the most popular software load balancers; cloud providers offer managed alternatives
✅ Auto-scaling works with load balancers to handle traffic spikes automatically
✅ Monitor request rates, error rates, latency, and backend health continuously
✅ Deploy redundant load balancers to avoid a single point of failure

What's Next?

Now that you understand load balancing, explore these related topics:

Reverse Proxy Explained — Learn how reverse proxies work and how they relate to load balancers
HTTP Protocol Complete Guide — Deep dive into the protocol that load balancers route
Client-Server Architecture Explained — Understand the foundation that load balancing builds upon

Load balancing is one of the most impactful infrastructure decisions you'll make. Start with a simple Round Robin setup, add health checks, externalize your sessions, and scale from there. You don't need to implement everything on day one — but you do need to plan for it.

Introduction

What You'll Learn

What is Load Balancing?

Why Use Load Balancing?

Load Balancing Algorithms

1. Round Robin

2. Weighted Round Robin

3. Least Connections

4. Weighted Least Connections

5. IP Hash

6. Least Response Time

7. Random

Algorithm Comparison

Layer 4 vs Layer 7 Load Balancing

Layer 4 Load Balancing (Transport Layer)

Layer 7 Load Balancing (Application Layer)

Comparison Table

When to Use Each

Health Checks and Failover

Types of Health Checks

Nginx Health Check Configuration

HAProxy Health Check Configuration

Failover Strategies

Session Persistence (Sticky Sessions)

The Problem

Solutions

Session Persistence Comparison

SSL/TLS Termination

Why Terminate SSL at the Load Balancer?

SSL Pass-through vs Termination vs Re-encryption

Popular Load Balancers

Software Load Balancers

Software Load Balancer Comparison

Cloud Load Balancers

Load Balancing in Microservices

Service-to-Service Load Balancing

Client-Side vs Server-Side Load Balancing

Service Discovery

DNS-Based Load Balancing

How It Works

DNS Round Robin Configuration

Limitations of DNS Load Balancing

Global Server Load Balancing (GSLB)

How GSLB Works

GSLB Routing Strategies

Cloud GSLB Services

Auto-Scaling with Load Balancers

How Auto-Scaling Works

Kubernetes Horizontal Pod Autoscaler

AWS Auto Scaling Group

Load Balancing Patterns

Active-Active

Active-Passive (Failover)

Blue-Green Deployment with Load Balancer

Canary Deployment

Monitoring and Metrics

Key Metrics to Monitor

Nginx Status Monitoring

HAProxy Stats Dashboard

Best Practices

Configuration Best Practices

Architecture Best Practices

Common Pitfalls

Summary and Key Takeaways

What's Next?

📬 Subscribe to Newsletter

💬 Comments