Back to blog

Load Balancing Explained

load-balancinginfrastructuredevopsbackendnetworkingperformance
Load Balancing Explained

Introduction

When your application starts receiving more traffic than a single server can handle, you have two choices: get a bigger server (vertical scaling) or add more servers (horizontal scaling). Load balancing is what makes horizontal scaling possible.

A load balancer distributes incoming network traffic across multiple servers, ensuring no single server bears too much load. It's one of the most critical pieces of infrastructure for any production application — from small web apps to systems serving millions of users.

This guide covers everything you need to know about load balancing: how it works, the algorithms behind it, the different types, and how to implement it with real-world tools.

What You'll Learn

✅ Understand what load balancing is and why it's essential
✅ Master load balancing algorithms (Round Robin, Least Connections, IP Hash, and more)
✅ Distinguish between Layer 4 and Layer 7 load balancing
✅ Implement health checks and failover for high availability
✅ Configure session persistence and SSL termination
✅ Set up load balancing with Nginx, HAProxy, and cloud providers
✅ Learn patterns like Active-Active, Active-Passive, and Global Server Load Balancing


What is Load Balancing?

Load balancing is the process of distributing incoming network traffic across multiple backend servers (also called a server pool or server farm) to ensure:

  • No single server is overwhelmed with too many requests
  • Application availability remains high even if a server fails
  • Response times stay consistently fast for all users

Without a load balancer, all traffic goes to a single server:

1000 requests/sec → [Single Server] → ❌ Overloaded, slow, crashes

With a load balancer:

1000 requests/sec → [Load Balancer] → [Server 1] ~333 req/s ✅
                                    → [Server 2] ~333 req/s ✅
                                    → [Server 3] ~333 req/s ✅

Why Use Load Balancing?

Load balancing solves several critical problems:

BenefitWithout Load BalancerWith Load Balancer
AvailabilitySingle point of failureSurvives server failures
ScalabilityLimited to one serverAdd servers as traffic grows
PerformanceDegrades under loadConsistent response times
MaintenanceDowntime for updatesRolling updates with zero downtime
EfficiencyOne server may be underused or overloadedEven distribution of work

Load Balancing Algorithms

The algorithm determines how the load balancer decides which server should handle each incoming request. Different algorithms suit different scenarios.

1. Round Robin

The simplest algorithm. Requests are distributed sequentially across servers in order.

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1  (cycle repeats)
Request 5 → Server 2
Request 6 → Server 3

Nginx configuration:

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}
 
server {
    location / {
        proxy_pass http://backend;
    }
}

Best for: Servers with identical specifications and stateless applications.

Drawback: Doesn't account for server load — a slow-processing request on Server 1 doesn't prevent more requests from being sent to it.

2. Weighted Round Robin

Like Round Robin but assigns more requests to more powerful servers.

Server 1 (weight=5):  receives 5 out of every 8 requests
Server 2 (weight=2):  receives 2 out of every 8 requests
Server 3 (weight=1):  receives 1 out of every 8 requests

Nginx configuration:

upstream backend {
    server 192.168.1.101 weight=5;  # 8-core, 32GB RAM
    server 192.168.1.102 weight=2;  # 4-core, 16GB RAM
    server 192.168.1.103 weight=1;  # 2-core, 8GB RAM
}

Best for: Environments with servers of different capacities.

3. Least Connections

Sends requests to the server with the fewest active connections.

The load balancer picks Server 2 because it has the fewest active connections.

Nginx configuration:

upstream backend {
    least_conn;
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}

Best for: Applications where requests have varying processing times (e.g., some requests take 50ms, others take 5 seconds).

4. Weighted Least Connections

Combines Least Connections with server weights. The server with the lowest ratio of (active connections / weight) gets the next request.

Server 1: 10 connections, weight=5 → ratio = 2.0
Server 2:  3 connections, weight=2 → ratio = 1.5 ✅ (selected)
Server 3:  4 connections, weight=1 → ratio = 4.0

Best for: Mixed server capacities with variable request durations.

5. IP Hash

Uses the client's IP address to determine which server receives the request. The same client IP always goes to the same server.

Client 203.0.113.10 → hash → Server 2 (always)
Client 198.51.100.5 → hash → Server 1 (always)
Client 192.0.2.100  → hash → Server 3 (always)

Nginx configuration:

upstream backend {
    ip_hash;
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
}

Best for: Applications that need basic session persistence without cookies or application-level session management.

Drawback: If a server goes down, all clients hashed to that server must be reassigned, breaking their sessions.

6. Least Response Time

Routes to the server with the fastest response time and fewest active connections. Requires the load balancer to actively measure response times.

Best for: Performance-critical applications where you want users routed to the fastest available server.

7. Random

Picks a random server for each request. Statistically distributes evenly over time.

Best for: Large server pools where simplicity is preferred and statistical distribution is sufficient.

Algorithm Comparison

AlgorithmConsiders Server LoadSession PersistenceComplexityBest Use Case
Round RobinLowEqual servers, stateless apps
Weighted Round Robin❌ (manual weights)LowMixed server capacities
Least ConnectionsMediumVariable request durations
IP HashLowBasic session affinity
Least Response TimeHighPerformance-critical apps
RandomLowLarge pools, simplicity

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model. The two most common are Layer 4 (Transport) and Layer 7 (Application).

Layer 4 Load Balancing (Transport Layer)

Operates at the TCP/UDP level. Makes routing decisions based on:

  • Source IP and port
  • Destination IP and port
  • Protocol (TCP or UDP)

The load balancer does not inspect the content of the request (no HTTP headers, URLs, or cookies).

How it works:

  1. Client establishes a TCP connection to the load balancer
  2. Load balancer selects a backend server (using chosen algorithm)
  3. Load balancer forwards raw TCP/UDP packets to the selected server
  4. All packets from the same connection go to the same server

Advantages:

  • Very fast (no packet inspection)
  • Low overhead and latency
  • Protocol-agnostic (works with HTTP, SMTP, FTP, databases, etc.)

Disadvantages:

  • Cannot route based on URL, headers, or content
  • Cannot modify requests or responses
  • Limited health check capabilities

Layer 7 Load Balancing (Application Layer)

Operates at the HTTP/HTTPS level. Makes routing decisions based on:

  • URL path (e.g., /api/* vs /static/*)
  • HTTP headers (e.g., Host, User-Agent, Cookie)
  • HTTP method (GET, POST, etc.)
  • Query parameters
  • Request body content

Content-based routing example with Nginx:

upstream api_servers {
    server 192.168.1.101;
    server 192.168.1.102;
}
 
upstream static_servers {
    server 192.168.1.201;
    server 192.168.1.202;
}
 
server {
    listen 80;
 
    location /api/ {
        proxy_pass http://api_servers;
    }
 
    location /static/ {
        proxy_pass http://static_servers;
    }
 
    location /images/ {
        proxy_pass http://static_servers;
    }
}

Advantages:

  • Content-aware routing decisions
  • Can modify requests/responses (add headers, rewrite URLs)
  • Advanced health checks (HTTP status codes, response body)
  • SSL termination
  • Caching, compression, and rate limiting

Disadvantages:

  • Higher latency (must parse HTTP)
  • More resource-intensive
  • HTTP/HTTPS only (not suitable for arbitrary TCP protocols)

Comparison Table

FeatureLayer 4Layer 7
OSI LayerTransport (TCP/UDP)Application (HTTP/HTTPS)
Routing Based OnIP, port, protocolURL, headers, cookies, content
PerformanceFaster (no inspection)Slower (parses HTTP)
Content Awareness
SSL TerminationPass-through only✅ Full termination
URL Routing
Header Manipulation
Caching
ProtocolsAny TCP/UDPHTTP/HTTPS
Use CaseHigh-throughput, database LBWeb apps, APIs, microservices

When to Use Each

Choose Layer 4 when:

  • You need maximum performance with minimal latency
  • You're load balancing non-HTTP protocols (databases, SMTP, gaming servers)
  • You don't need content-based routing

Choose Layer 7 when:

  • You need to route based on URL paths, headers, or cookies
  • You want SSL termination at the load balancer
  • You need advanced features like caching, compression, or rate limiting
  • You're building microservices with different backends for different paths

Health Checks and Failover

A load balancer must know which servers are healthy and able to handle requests. Health checks are periodic probes sent to backend servers to verify they're alive and functioning correctly.

Types of Health Checks

1. TCP Health Check (Layer 4)

Simply checks if the server accepts TCP connections on the specified port.

Load Balancer → TCP SYN → Server:8080
Server:8080  → TCP SYN-ACK → ✅ Healthy

2. HTTP Health Check (Layer 7)

Sends an HTTP request to a health endpoint and checks the response status code.

Load Balancer → GET /health → Server:8080
Server:8080  → 200 OK → ✅ Healthy
Server:8080  → 503 Service Unavailable → ❌ Unhealthy

Common health endpoint implementation:

// Express.js health check endpoint
app.get('/health', async (req, res) => {
  try {
    // Check database connection
    await db.query('SELECT 1');
 
    // Check Redis connection
    await redis.ping();
 
    res.status(200).json({
      status: 'healthy',
      uptime: process.uptime(),
      timestamp: new Date().toISOString(),
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error.message,
    });
  }
});
// Spring Boot Actuator (auto-configured health endpoint)
// GET /actuator/health → { "status": "UP" }
 
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
    @Override
    public Health health() {
        if (isDatabaseReachable()) {
            return Health.up()
                .withDetail("database", "PostgreSQL")
                .build();
        }
        return Health.down()
            .withDetail("error", "Cannot reach database")
            .build();
    }
}

3. Script-based Health Check

Runs a custom script that performs complex checks (disk space, memory, application logic).

Nginx Health Check Configuration

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
    server 192.168.1.103;
 
    # Passive health checks (Nginx OSS)
    # Mark server as down after 3 failures, retry after 30 seconds
}
 
server {
    location / {
        proxy_pass http://backend;
        proxy_next_upstream error timeout http_500 http_502 http_503;
        proxy_connect_timeout 5s;
        proxy_read_timeout 10s;
    }
}

HAProxy Health Check Configuration

backend web_servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
 
    server srv1 192.168.1.101:8080 check inter 5s fall 3 rise 2
    server srv2 192.168.1.102:8080 check inter 5s fall 3 rise 2
    server srv3 192.168.1.103:8080 check inter 5s fall 3 rise 2

Parameters explained:

  • inter 5s: Check every 5 seconds
  • fall 3: Mark unhealthy after 3 consecutive failures
  • rise 2: Mark healthy again after 2 consecutive successes

Failover Strategies

When a server fails a health check, the load balancer must handle it:

Graceful degradation: When servers fail, remaining servers absorb the extra load:

Normal:    Server A (33%) | Server B (33%) | Server C (33%)
Server C ❌: Server A (50%) | Server B (50%) | Server C (down)
Server B ❌: Server A (100%) | Server B (down) | Server C (down)

Session Persistence (Sticky Sessions)

By default, a load balancer may route consecutive requests from the same user to different servers. This is a problem when the application stores session data in server memory.

The Problem

Request 1: User logs in → Server A (session stored in Server A memory)
Request 2: User loads dashboard → Server B (no session found → ❌ redirected to login)

Solutions

1. Sticky Sessions (Cookie-based)

The load balancer inserts a cookie to pin the user to a specific server.

upstream backend {
    server 192.168.1.101;
    server 192.168.1.102;
 
    # Nginx Plus (commercial) sticky cookie
    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

HAProxy sticky sessions:

backend web_servers
    balance roundrobin
    cookie SERVERID insert indirect nocache
 
    server srv1 192.168.1.101:8080 cookie s1 check
    server srv2 192.168.1.102:8080 cookie s2 check

2. IP Hash (as discussed above)

Route by client IP — same IP always goes to same server.

3. Externalized Sessions (Recommended)

Instead of relying on sticky sessions, store sessions externally so any server can handle any request:

// Express.js with Redis session store
import session from 'express-session';
import RedisStore from 'connect-redis';
import { createClient } from 'redis';
 
const redisClient = createClient({ url: 'redis://redis-host:6379' });
await redisClient.connect();
 
app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false,
  cookie: { secure: true, maxAge: 86400000 }, // 24 hours
}));

Why externalized sessions are better:

  • Any server can handle any request (true statelessness)
  • Server failures don't lose user sessions
  • Easy to scale horizontally
  • No sticky session complexity

Session Persistence Comparison

ApproachServer Failure ImpactScalabilityComplexity
Sticky Sessions (Cookie)Sessions lost when server diesLimitedLow
IP HashSessions lost when server diesLimitedLow
External Store (Redis)No impact — sessions surviveExcellentMedium
JWT Tokens (Stateless)No impact — no server stateExcellentMedium

SSL/TLS Termination

SSL termination means the load balancer handles the encryption/decryption of HTTPS traffic, so backend servers receive plain HTTP.

Why Terminate SSL at the Load Balancer?

Benefits:

  • Reduced server load: Encryption/decryption is CPU-intensive. Offloading it to the load balancer frees backend servers for application logic.
  • Centralized certificate management: Manage SSL certificates in one place instead of every server.
  • Simplified backend: Servers don't need SSL configuration.

Nginx SSL termination:

server {
    listen 443 ssl;
    server_name example.com;
 
    ssl_certificate     /etc/ssl/certs/example.com.crt;
    ssl_certificate_key /etc/ssl/private/example.com.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;
 
    location / {
        proxy_pass http://backend;  # Plain HTTP to backend
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
    }
}

SSL Pass-through vs Termination vs Re-encryption

ModeDescriptionBackend SeesUse Case
TerminationLB decrypts, sends HTTP to backendPlain HTTPMost web applications
Pass-throughLB forwards encrypted traffic as-isHTTPS (must handle SSL)End-to-end encryption required
Re-encryptionLB decrypts, then re-encrypts to backendHTTPS (re-encrypted)Compliance requirements

Software Load Balancers

1. Nginx

The most popular web server and reverse proxy, also widely used as a load balancer.

# Complete Nginx load balancer configuration
upstream api_backend {
    least_conn;
    server 10.0.1.101:8080 weight=3;
    server 10.0.1.102:8080 weight=2;
    server 10.0.1.103:8080 weight=1;
    server 10.0.1.104:8080 backup;  # Only used when others are down
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
 
    ssl_certificate     /etc/ssl/certs/api.example.com.crt;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;
 
    location / {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
        proxy_send_timeout 30s;
 
        # Retry on failure
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 3;
    }
}

Strengths: Lightweight, high performance, rich ecosystem, excellent documentation.

2. HAProxy

Purpose-built, high-performance TCP/HTTP load balancer.

# HAProxy configuration
global
    log     stdout format raw local0
    maxconn 4096
 
defaults
    mode    http
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    option  httplog
 
frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/example.pem
    redirect scheme https if !{ ssl_fc }
 
    # Route based on URL path
    acl is_api path_beg /api
    acl is_static path_beg /static /images /css /js
 
    use_backend api_servers if is_api
    use_backend static_servers if is_static
    default_backend web_servers
 
backend web_servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server web1 10.0.1.101:8080 check inter 5s fall 3 rise 2
    server web2 10.0.1.102:8080 check inter 5s fall 3 rise 2
 
backend api_servers
    balance leastconn
    option httpchk GET /api/health
    server api1 10.0.2.101:3000 check inter 5s fall 3 rise 2
    server api2 10.0.2.102:3000 check inter 5s fall 3 rise 2
 
backend static_servers
    balance roundrobin
    server static1 10.0.3.101:80 check
    server static2 10.0.3.102:80 check

Strengths: Built for load balancing, excellent stats dashboard, very low latency, advanced health checks, used by GitHub, Stack Overflow, and Reddit.

3. Traefik

Modern, cloud-native reverse proxy and load balancer with automatic service discovery.

# docker-compose.yml with Traefik
services:
  traefik:
    image: traefik:v3.0
    command:
      - "--providers.docker=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.letsencrypt.acme.email=admin@example.com"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - letsencrypt:/letsencrypt
 
  api:
    image: my-api:latest
    deploy:
      replicas: 3
    labels:
      - "traefik.http.routers.api.rule=Host(`api.example.com`)"
      - "traefik.http.routers.api.tls.certresolver=letsencrypt"
      - "traefik.http.services.api.loadbalancer.server.port=3000"
 
volumes:
  letsencrypt:

Strengths: Automatic service discovery with Docker/Kubernetes, built-in Let's Encrypt, dynamic configuration without restarts.

Software Load Balancer Comparison

FeatureNginxHAProxyTraefik
Primary UseWeb server + LBDedicated LBCloud-native LB
Layer 4
Layer 7
ConfigurationFile-basedFile-basedFile, Docker labels, API
Auto-discovery❌ (manual)❌ (manual)✅ (Docker, K8s)
Let's EncryptVia certbotVia ACME✅ Built-in
DashboardNginx Plus only✅ Stats page✅ Web UI
Hot Reloadnginx -s reloadhaproxy -sf✅ Automatic
Used By~34% of all websitesGitHub, RedditDocker-heavy orgs

Cloud Load Balancers

Cloud providers offer managed load balancers that require zero server management:

AWS:

  • Application Load Balancer (ALB): Layer 7, content-based routing, WebSocket support
  • Network Load Balancer (NLB): Layer 4, ultra-low latency, millions of requests per second
  • Classic Load Balancer (CLB): Legacy, supports both L4 and L7

Google Cloud:

  • Cloud Load Balancing: Global, anycast-based, auto-scaling, supports HTTP(S), TCP, UDP

Azure:

  • Azure Load Balancer: Layer 4
  • Application Gateway: Layer 7 with WAF (Web Application Firewall)

When to use cloud vs software load balancers:

ScenarioRecommendation
Running on a single cloud providerUse cloud LB (managed, auto-scaling)
Multi-cloud or hybrid deploymentSoftware LB (Nginx, HAProxy)
Kubernetes environmentCloud LB + Ingress Controller (Nginx, Traefik)
Budget-conscious, self-hostedNginx or HAProxy on a VPS
Need fine-grained controlSoftware LB with custom configuration

Load Balancing in Microservices

In a microservices architecture, load balancing becomes even more important as services communicate with each other constantly.

Service-to-Service Load Balancing

Client-Side vs Server-Side Load Balancing

Server-side load balancing (traditional): A dedicated load balancer sits between services.

Service A → Load Balancer → Service B (instance 1)
                          → Service B (instance 2)
                          → Service B (instance 3)

Client-side load balancing: The calling service chooses which instance to call, using a service registry.

Service A → [Service Registry] → knows all Service B instances
          → directly calls Service B (instance 2)

Client-side example (conceptual):

class ClientSideLoadBalancer {
  private instances: string[];
  private currentIndex = 0;
 
  constructor(private serviceRegistry: ServiceRegistry) {
    this.instances = [];
  }
 
  async getNextInstance(serviceName: string): Promise<string> {
    // Refresh instances from service registry
    this.instances = await this.serviceRegistry.getInstances(serviceName);
 
    if (this.instances.length === 0) {
      throw new Error(`No instances available for ${serviceName}`);
    }
 
    // Round Robin
    const instance = this.instances[this.currentIndex % this.instances.length];
    this.currentIndex++;
    return instance;
  }
}

Service Discovery

For load balancing in microservices, services need to find each other. This is called service discovery.

DNS-based discovery:

# Kubernetes Service (built-in DNS-based discovery)
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP
---
# Other services can call: http://user-service/api/users
# Kubernetes DNS resolves to healthy pod IPs

Registry-based discovery (Consul, Eureka, etcd):

Service B starts → registers with Consul: "user-service at 10.0.1.5:3000"
Service A needs Service B → asks Consul: "where is user-service?"
Consul responds: ["10.0.1.5:3000", "10.0.1.6:3000", "10.0.1.7:3000"]
Service A picks one → calls 10.0.1.6:3000

DNS-Based Load Balancing

DNS can perform basic load balancing by returning different IP addresses for the same domain name.

How It Works

Client → DNS query: "api.example.com" → DNS server
DNS server → Response: [203.0.113.1, 203.0.113.2, 203.0.113.3]
Client → Connects to 203.0.113.1 (typically the first IP)

Next query (from different or same client, after TTL expires):

DNS server → Response: [203.0.113.2, 203.0.113.3, 203.0.113.1]  (rotated)

DNS Round Robin Configuration

; BIND DNS zone file
api.example.com.    300    IN    A    203.0.113.1
api.example.com.    300    IN    A    203.0.113.2
api.example.com.    300    IN    A    203.0.113.3

Limitations of DNS Load Balancing

LimitationImpact
DNS cachingClients and resolvers cache DNS responses. Changes take time to propagate (based on TTL).
No health checksDNS doesn't know if a server is down. Clients may connect to dead servers.
No session awarenessNo control over which server handles which client.
Uneven distributionSome clients cache longer, leading to uneven load.

When DNS load balancing is useful:

  • Distributing traffic across geographic regions (GSLB)
  • As a first layer before dedicated load balancers
  • Simple setups where health checks aren't critical

Global Server Load Balancing (GSLB)

GSLB distributes traffic across servers in multiple geographic locations to minimize latency and provide disaster recovery.

How GSLB Works

  1. User makes a DNS query for app.example.com
  2. GSLB-enabled DNS considers:
    • Geographic proximity of the user
    • Health of each data center
    • Current load at each location
    • Latency to each data center
  3. Returns the IP of the closest/best data center

GSLB Routing Strategies

StrategyDescriptionUse Case
GeolocationRoute based on client's country/regionCompliance, content localization
Latency-basedRoute to lowest-latency data centerPerformance optimization
FailoverPrimary/secondary data centersDisaster recovery
WeightedDistribute percentage of traffic to each regionGradual migrations, canary deployments

Cloud GSLB Services

  • AWS Route 53: Latency-based, geolocation, failover, weighted routing
  • Google Cloud DNS: Geolocation routing with health checks
  • Azure Traffic Manager: Performance, weighted, priority, geographic routing
  • Cloudflare Load Balancing: Anycast-based with geo-steering

Auto-Scaling with Load Balancers

Load balancers work together with auto-scaling to handle traffic spikes automatically.

How Auto-Scaling Works

Scaling triggers:

  • CPU usage > 70% for 5 minutes
  • Memory usage > 80%
  • Request count > 1000/sec per server
  • Response time > 500ms average

Kubernetes Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

AWS Auto Scaling Group

Load Balancer (ALB)

Auto Scaling Group
  ├── Min: 2 instances
  ├── Desired: 3 instances
  ├── Max: 10 instances
  └── Scaling Policy:
      ├── Scale Up: CPU > 70% for 5 min → add 2 instances
      └── Scale Down: CPU < 30% for 15 min → remove 1 instance

Best practices for auto-scaling:

  • Set conservative scale-up thresholds (react quickly to traffic spikes)
  • Set aggressive scale-down thresholds (scale down slowly to avoid flapping)
  • Always have a minimum number of healthy instances
  • Use health checks to ensure new instances are ready before receiving traffic

Load Balancing Patterns

Active-Active

All servers are actively handling traffic simultaneously.

Load Balancer
├── Server A (Active) ← handles requests
├── Server B (Active) ← handles requests
└── Server C (Active) ← handles requests

Pros: Maximum resource utilization, highest throughput.

Cons: All servers need identical configuration, more complex to maintain.

Active-Passive (Failover)

Only the active server handles traffic. The passive server is a standby that takes over when the active server fails.

Normal:
Load Balancer → Server A (Active) ← handles all requests
                Server B (Passive/Standby) ← idle, monitoring
 
Failover:
Load Balancer → Server A (Down) ✗
                Server B (Now Active) ← takes over

Pros: Simple, guaranteed failover capacity.

Cons: Passive server sits idle (wasted resources).

Blue-Green Deployment with Load Balancer

Use load balancing to enable zero-downtime deployments:

Step 1: Blue (v1) is live
Load Balancer → Blue Servers (v1) ✅
 
Step 2: Deploy Green (v2), test it
Load Balancer → Blue Servers (v1) ✅
                Green Servers (v2) [testing]
 
Step 3: Switch traffic to Green
Load Balancer → Green Servers (v2) ✅
                Blue Servers (v1) [standby for rollback]

Canary Deployment

Route a small percentage of traffic to the new version:

upstream backend {
    server 10.0.1.101:8080 weight=9;   # v1 (90% traffic)
    server 10.0.1.102:8080 weight=1;   # v2 (10% traffic — canary)
}

Gradually increase the canary weight as confidence grows:

Phase 1:  v1 = 90%, v2 = 10%  → monitor metrics
Phase 2:  v1 = 70%, v2 = 30%  → monitor metrics
Phase 3:  v1 = 50%, v2 = 50%  → monitor metrics
Phase 4:  v1 = 0%,  v2 = 100% → rollout complete

Monitoring and Metrics

A load balancer produces valuable metrics for understanding your system's health and performance.

Key Metrics to Monitor

MetricWhat It Tells YouAlert Threshold
Request rateTraffic volume (req/s)Sudden spikes or drops
Error ratePercentage of 4xx/5xx responses> 1% of total traffic
Latency (p50, p95, p99)Response time distributionp99 > 1 second
Active connectionsCurrent concurrent connectionsApproaching server limits
Backend healthNumber of healthy vs unhealthy serversAny server down
BandwidthData transfer in/outApproaching network limits
Connection queueRequests waiting for a serverQueue > 0 sustained

Nginx Status Monitoring

# Enable Nginx stub status
server {
    listen 8080;
    location /nginx_status {
        stub_status on;
        allow 10.0.0.0/8;    # Allow internal network only
        deny all;
    }
}
# Output
Active connections: 291
server accepts handled requests
 16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106

HAProxy Stats Dashboard

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if TRUE
    stats auth admin:password

HAProxy provides a built-in web dashboard showing real-time metrics for every backend server: connections, request rates, response times, health status, and error rates.


Best Practices

Configuration Best Practices

1. Always configure health checks

# Don't just check if port is open — check application health
location /health {
    proxy_pass http://backend/health;
    proxy_connect_timeout 2s;
    proxy_read_timeout 2s;
}

2. Set appropriate timeouts

proxy_connect_timeout 5s;    # Time to establish connection to backend
proxy_read_timeout 30s;      # Time to wait for response from backend
proxy_send_timeout 10s;      # Time to send request to backend

3. Pass client information to backends

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

4. Configure connection limits

upstream backend {
    server 192.168.1.101 max_conns=100;
    server 192.168.1.102 max_conns=100;
    queue 50 timeout=10s;  # Queue requests when all servers are at max
}

Architecture Best Practices

1. Eliminate the load balancer as a single point of failure

Use redundant load balancers with failover:

          ┌── Load Balancer 1 (Active)
Virtual IP ──┤
          └── Load Balancer 2 (Standby)

            ┌───────┼───────┐
          Server  Server  Server

2. Use the right algorithm for your workload

  • Stateless API: Round Robin or Least Connections
  • WebSocket connections: IP Hash or Least Connections
  • Mixed server sizes: Weighted Round Robin
  • Performance-critical: Least Response Time

3. Externalize session state

Don't rely on sticky sessions. Use Redis, Memcached, or JWT tokens for session management.

4. Enable connection keep-alive

upstream backend {
    server 192.168.1.101;
    keepalive 32;  # Keep 32 idle connections to each backend
}
 
location / {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";  # Enable keepalive to backend
}

5. Plan for graceful shutdown

When removing a server for maintenance:

1. Mark server as "draining" (stop new connections)
2. Wait for existing connections to complete
3. Remove server from pool
4. Perform maintenance
5. Re-add server to pool
6. Monitor health checks confirm it's healthy

Common Pitfalls

PitfallProblemSolution
No health checksTraffic sent to dead serversAlways configure health endpoints
Single load balancerLB becomes single point of failureDeploy redundant LBs with failover
Sticky sessions everywhereLimits scalability, complex failoverUse external session stores (Redis)
Ignoring timeoutsSlow backends block all connectionsSet connect, read, and send timeouts
No monitoringProblems discovered by users, not engineersMonitor request rate, errors, latency
Misconfigured SSLMixed content, security warningsTerminate SSL at LB, redirect HTTP to HTTPS
No connection limitsOne backend overwhelmed by burstSet max_conns per server
No graceful shutdownActive requests dropped during deploymentDrain connections before removing servers

Summary and Key Takeaways

Load balancing distributes traffic across multiple servers for scalability, availability, and performance
Algorithms like Round Robin, Least Connections, and IP Hash determine how traffic is routed
Layer 4 load balancing is faster but content-unaware; Layer 7 enables routing based on URLs, headers, and cookies
Health checks are essential — always configure them to detect and remove unhealthy servers
Externalize session state (Redis, JWT) instead of relying on sticky sessions
SSL termination at the load balancer simplifies certificate management and reduces backend load
Nginx and HAProxy are the most popular software load balancers; cloud providers offer managed alternatives
Auto-scaling works with load balancers to handle traffic spikes automatically
Monitor request rates, error rates, latency, and backend health continuously
✅ Deploy redundant load balancers to avoid a single point of failure


What's Next?

Now that you understand load balancing, explore these related topics:


Load balancing is one of the most impactful infrastructure decisions you'll make. Start with a simple Round Robin setup, add health checks, externalize your sessions, and scale from there. You don't need to implement everything on day one — but you do need to plan for it.

📬 Subscribe to Newsletter

Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.

We respect your privacy. Unsubscribe at any time.

💬 Comments

Sign in to leave a comment

We'll never post without your permission.