Back to blog

MySQL Replication Fundamentals: Async, Semi-Sync & GTID Explained

mysqlreplicationgtiddatabasehigh-availability
MySQL Replication Fundamentals: Async, Semi-Sync & GTID Explained

Before you can appreciate what Galera Cluster does, you need to understand what it's replacing — and why that replacement was necessary.

MySQL replication has been around since MySQL 3.23 (2001). It works well for many use cases, scales read traffic elegantly, and powers countless production systems today. It also has failure modes that become catastrophic at the wrong moment. Knowing those failure modes is what makes the trade-offs in Galera's design legible.

This post covers MySQL replication from the ground up: how the binary log works, the difference between statement-based and row-based replication, what GTID is and why it matters, async vs semi-sync trade-offs, and the practical failure scenarios every operator eventually encounters.


How MySQL Replication Works

MySQL replication is built on one foundational mechanism: the binary log (binlog).

Every change to data on the primary — INSERT, UPDATE, DELETE, DDL — is written to the binlog before or during the transaction commit. The replica connects to the primary, streams the binlog, and replays those changes in order.

Three threads do all the work:

  • IO Thread (on replica): connects to primary, reads binlog events, writes them to the local relay log
  • SQL Thread (on replica): reads the relay log, applies events to the replica's data files
  • Binlog Dump Thread (on primary): serves binlog events to the IO thread

This architecture has one crucial property — the primary writes to its binlog and replies to the client before the IO thread has even fetched the event. That's async replication by definition.


Binlog Formats: Statement, Row, and Mixed

MySQL supports three binlog formats, controlled by binlog_format:

Statement-Based Replication (SBR)

The binlog records the original SQL statement:

-- What gets written to the binlog:
UPDATE orders SET status = 'shipped' WHERE created_at < NOW() - INTERVAL 1 DAY;

Problem: NOW() on the replica is evaluated when the SQL thread applies the statement — which is later than on the primary. Any non-deterministic function (NOW(), UUID(), RAND(), USER()) can produce different results on the replica.

Use it when: Your SQL is fully deterministic and you want compact binlogs.

Row-Based Replication (RBR)

The binlog records the actual before/after values for every affected row:

-- What gets written to the binlog (conceptually):
-- Row update: orders.id=42: status 'pending' -> 'shipped'
-- Row update: orders.id=87: status 'pending' -> 'shipped'
-- Row update: orders.id=103: status 'pending' -> 'shipped'
-- ... (one entry per affected row)

Advantage: Perfectly deterministic. The replica applies exactly the same data values regardless of when it runs.

Disadvantage: Large batches produce huge binlogs. An UPDATE affecting 1 million rows generates 1 million binlog entries.

This is what Galera uses. PXC requires binlog_format=ROW — write-sets contain row images, not SQL statements.

Mixed Mode

MySQL starts with statement mode and switches to row mode automatically when it detects non-deterministic statements. It's the default in MySQL 8.0.

Recommended setting for Galera/PXC: Always use ROW.


Async Replication: The Default Behavior

In standard async replication, the primary commits the transaction and returns to the client without waiting for any replica to acknowledge receipt.

What This Means in Practice

The replica is always behind the primary by some amount of time — the replication lag. This lag is usually milliseconds to seconds under normal load. Under heavy write load, it can grow to minutes or hours.

You can measure replication lag with:

-- On the replica:
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master
 
-- More accurate (accounts for network/clock skew):
-- Use the heartbeat table method or pt-heartbeat

The Failover Problem

Suppose the primary crashes. You want to promote Replica 1 to primary. But:

Primary:  executed GTID set = 0-1-100
Replica1: executed GTID set = 0-1-97   ← 3 transactions behind
Replica2: executed GTID set = 0-1-95   ← 5 transactions behind

If you promote Replica 1 immediately, 3 transactions are permanently lost. Those rows, those order updates, those payments — gone. If the application had already told the user "your payment was processed" (because the primary said COMMIT OK), you now have a lie in your system.

Operators deal with this in two ways:

  1. Accept the data loss — for non-critical data
  2. Wait until replicas catch up — impossible if the primary is gone (you can't fetch missing events from a dead server)

This is the core problem that semi-synchronous replication and Galera each address differently.


Semi-Synchronous Replication

Semi-sync changes the commit protocol so that the primary waits for at least one replica to acknowledge receipt of the binlog events before returning to the client.

Key distinction: the replica ACKs receipt of the event in its relay log, not application to the data files. The data hasn't been applied when the client gets OK. But the bytes are on disk on at least one replica.

The Guarantee

If the primary crashes after a semi-sync commit, at least one replica has the transaction in its relay log and can apply it. No committed transaction is lost — as long as the promoted replica is the one that sent the ACK.

The Problem: Fallback to Async

MySQL semi-sync has a timeout (rpl_semi_sync_master_timeout, default 10 seconds). If no replica ACKs within the timeout, MySQL falls back to async silently.

-- Check if semi-sync is actually active:
SHOW STATUS LIKE 'Rpl_semi_sync_master_status';
-- Value: ON = semi-sync active, OFF = fell back to async

Under load, network partition, or replica lag spikes, you might be running async without knowing it — until the primary crashes and you find data loss.

Configuring Semi-Sync

-- On primary (MySQL 8.0):
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = 1;
SET GLOBAL rpl_semi_sync_source_timeout = 1000;     -- 1 second timeout
SET GLOBAL rpl_semi_sync_source_wait_for_replica_count = 1;  -- wait for 1 ACK
 
-- On replica:
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = 1;
 
-- Verify:
SHOW STATUS LIKE 'Rpl_semi_sync%';

Semi-Sync vs Galera

Semi-sync gives you durability (no committed data loss on failover) but does not give you:

  • Automatic failover
  • Multi-master writes
  • Cluster membership management
  • Replica lag elimination (replicas still apply async after ACKing)

You still need Orchestrator or MHA for failover automation. And your replicas still lag behind for reads.


GTID: Global Transaction Identifiers

GTID is the most important concept for understanding both MySQL replication operations and Galera internals.

What Is a GTID?

Every transaction executed on a MySQL server gets a Global Transaction Identifier:

GTID = source_uuid:transaction_id
 
Example: 3E11FA47-71CA-11E1-9E33-C80AA9429562:23

The source_uuid is the server's UUID (auto-generated at startup, stored in auto.cnf). The transaction_id is a monotonically increasing integer per server.

A server tracks the set of all GTIDs it has executed in its gtid_executed variable:

SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
-- Output: 3E11FA47-71CA-11E1-9E33-C80AA9429562:1-100
-- Means: this server has executed transactions 1 through 100 from that source

Why GTID Matters for Replication

Before GTID, replication position was tracked as (binlog_file, binlog_position):

CHANGE MASTER TO
  MASTER_HOST='primary',
  MASTER_LOG_FILE='mysql-bin.000003',
  MASTER_LOG_POS=4;

This was tied to a specific binary log file on a specific server. If you promoted a replica to primary, every other replica had to find the equivalent position in the new primary's binlog manually — an error-prone process.

With GTID, position is universal:

CHANGE MASTER TO
  MASTER_HOST='new-primary',
  MASTER_AUTO_POSITION=1;   -- "give me everything I'm missing"

MySQL figures out the gaps automatically by comparing gtid_executed sets.

GTID in Failover

Primary UUID:  A
Replica1 UUID: B
Replica2 UUID: C
 
Before crash:
  Primary (A): A:1-100 (all 100 transactions)
  Replica1 (B): A:1-97  (missing A:98-100)
  Replica2 (C): A:1-95  (missing A:96-100)

Replica1 is the "most advanced" replica. After promotion:

-- Replica2 reconnects to Replica1 (now primary):
CHANGE MASTER TO MASTER_HOST='replica1', MASTER_AUTO_POSITION=1;
-- MySQL calculates: Replica1 has A:1-100, Replica2 has A:1-95
-- Automatically fetches and applies A:96-100

No manual binlog position arithmetic. This is why GTID-based replication is the standard for any serious HA setup.

Enabling GTID

# my.cnf (both primary and replica)
[mysqld]
gtid_mode                = ON
enforce_gtid_consistency = ON
log_bin                  = mysql-bin
binlog_format            = ROW
server_id                = 1   # unique per server
-- Verify on primary:
SHOW GLOBAL VARIABLES LIKE 'gtid_mode';
-- gtid_mode | ON
 
SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
-- The set of all committed GTIDs on this server

GTID Restrictions

With enforce_gtid_consistency = ON, certain operations are prohibited because they can't be reliably tracked as atomic transactions:

-- These will fail with GTIDs enabled:
CREATE TABLE ... SELECT ...;   -- combines DDL + DML in one statement
CREATE TEMPORARY TABLE ...;    -- inside a transaction
DROP TEMPORARY TABLE ...;      -- inside a transaction

This matters for Galera — PXC enforces the same restrictions and adds more via PXC Strict Mode.


Replication Lag: Causes and Measurement

Replication lag is the time between when the primary commits a transaction and when the replica has applied it.

Root Causes

1. Single-threaded SQL thread (MySQL < 5.6)

Before MySQL 5.6, the SQL thread applied events serially. If your primary handles 1,000 writes/second, the SQL thread processes them one by one and can't keep up.

2. Multi-threaded replication (MySQL 5.6+)

MySQL 5.6 introduced parallel replication — multiple SQL threads working in parallel, one per database schema. MySQL 5.7 improved this with logical-clock-based parallelism (transactions that don't conflict can apply in parallel).

# Enable multi-threaded replication:
slave_parallel_workers   = 4
slave_parallel_type      = LOGICAL_CLOCK
slave_preserve_commit_order = ON

3. Large transactions

A single UPDATE orders SET ... affecting 5 million rows creates one enormous binlog event. The SQL thread applies it as one operation — no parallelism possible.

4. Network bandwidth

High-write workloads generate large binlogs. If the replica's IO thread can't fetch events fast enough, it falls behind at the network layer.

5. Backup operations

Running mysqldump --single-transaction on a replica holds the replica's SQL thread in a consistent read view, which can cause it to lag while the backup runs.

Measuring Lag Accurately

Seconds_Behind_Master in SHOW SLAVE STATUS measures the timestamp difference between the SQL thread's current event and the primary. It's unreliable because:

  • It resets to 0 when the SQL thread is idle (even if the IO thread is still fetching)
  • Clock skew between servers distorts it
  • It shows 0 if replication is broken

The production-grade tool is pt-heartbeat from Percona Toolkit:

# On primary: insert a heartbeat row every second
pt-heartbeat --update --daemonize \
  --database heartbeat --host primary \
  --user repl --password secret
 
# On replica: measure actual lag
pt-heartbeat --monitor --host replica \
  --database heartbeat --user repl --password secret
# Output: 0.00s [  0.00s,  0.00s,  0.00s ]
# Actual seconds  <1m avg  <5m avg  <15m avg

Automated Failover Tools

Async replication requires external tooling to automate failover. Two tools dominate this space:

Orchestrator

The most widely-used MySQL topology manager:

  • Discovers your replication topology automatically
  • Detects primary failure via health checks
  • Promotes the most advanced replica
  • Re-points other replicas to the new primary
  • Provides REST API and web UI

Orchestrator + semi-sync replication is a solid HA setup — but you're still managing an external service, handling VIP switches, and accepting that failover takes ~30 seconds under best conditions.

MySQL Router (InnoDB Cluster)

MySQL's official router for InnoDB Cluster setups (Group Replication). Covered in Post 10 of this series.


The Replication Mental Model: What Galera Changes

Here's a comparison of what each replication type guarantees:

PropertyAsyncSemi-SyncGalera
Write committed = durable on replicaNoOn 1+ replica (relay log only)Yes, on quorum
Replication lagYes (ms to hours)Yes (same lag, just durable events)None (synchronous)
Multi-master writesNoNoYes
Auto failoverExternal tool requiredExternal tool requiredBuilt-in
Replica readable without lagNoNoYes
Write overhead per commitNoneNetwork round-tripNetwork round-trip + certification
WAN-friendlyYesSomewhatLatency-sensitive

The trade-off Galera makes: every commit costs a certification round-trip to all nodes. On a LAN, this is typically 1–5ms. Galera is not suitable for geographically distributed nodes where latency exceeds ~30ms.


Hands-On: Setting Up Classic Async Replication

Let's set up a working async replication pair with Docker Compose, the foundation that Galera replaces.

# docker-compose.yml
services:
  mysql-primary:
    image: percona/percona-server:8.0
    container_name: mysql-primary
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_REPLICATION_USER: replicator
      MYSQL_REPLICATION_PASSWORD: replpass
    command: >
      --server-id=1
      --gtid-mode=ON
      --enforce-gtid-consistency=ON
      --log-bin=mysql-bin
      --binlog-format=ROW
      --log-replica-updates=ON
    ports:
      - "3306:3306"
    volumes:
      - primary-data:/var/lib/mysql
 
  mysql-replica:
    image: percona/percona-server:8.0
    container_name: mysql-replica
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
    command: >
      --server-id=2
      --gtid-mode=ON
      --enforce-gtid-consistency=ON
      --log-bin=mysql-bin
      --binlog-format=ROW
      --log-replica-updates=ON
      --skip-replica-start
    ports:
      - "3307:3306"
    volumes:
      - replica-data:/var/lib/mysql
    depends_on:
      - mysql-primary
 
volumes:
  primary-data:
  replica-data:

Start and configure replication:

docker compose up -d
 
# Wait for both servers to be ready
sleep 20
 
# Create replication user on primary
docker exec mysql-primary mysql -uroot -prootpass -e "
  CREATE USER IF NOT EXISTS 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'replpass';
  GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
  FLUSH PRIVILEGES;
"
 
# Configure replica to connect to primary
docker exec mysql-replica mysql -uroot -prootpass -e "
  CHANGE REPLICATION SOURCE TO
    SOURCE_HOST='mysql-primary',
    SOURCE_USER='replicator',
    SOURCE_PASSWORD='replpass',
    SOURCE_AUTO_POSITION=1;
  START REPLICA;
"
 
# Verify replication is running
docker exec mysql-replica mysql -uroot -prootpass -e "SHOW REPLICA STATUS\G" | grep -E "Slave_IO_Running|Slave_SQL_Running|Seconds_Behind"

Expected output:

Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Seconds_Behind_Source: 0

Test replication:

# Create data on primary
docker exec mysql-primary mysql -uroot -prootpass -e "
  CREATE DATABASE testdb;
  USE testdb;
  CREATE TABLE messages (id INT AUTO_INCREMENT PRIMARY KEY, content VARCHAR(255), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
  INSERT INTO messages (content) VALUES ('Hello from primary'), ('Second message');
"
 
# Verify it appears on replica
docker exec mysql-replica mysql -uroot -prootpass -e "
  USE testdb; SELECT * FROM messages;
"

Simulating Replication Lag

A useful exercise: simulate lag and observe Seconds_Behind_Master.

# On primary: generate a large write
docker exec mysql-primary mysql -uroot -prootpass -e "
  USE testdb;
  CREATE TABLE big_table (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255));
  INSERT INTO big_table (data)
  SELECT REPEAT('x', 255) FROM information_schema.COLUMNS a
  CROSS JOIN information_schema.COLUMNS b
  LIMIT 100000;
"
 
# While insert is running, check replica lag:
docker exec mysql-replica mysql -uroot -prootpass -e "
  SHOW REPLICA STATUS\G" | grep Seconds_Behind

You'll see Seconds_Behind_Source climbing. Once the SQL thread catches up, it drops to 0. This is the "eventual consistency" window where reads from the replica serve stale data.


Key Status Variables for Monitoring

-- On replica: replication health
SHOW REPLICA STATUS\G
 
-- Key fields:
--   Replica_IO_Running     → YES means IO thread is connected and running
--   Replica_SQL_Running    → YES means SQL thread is applying events
--   Seconds_Behind_Source  → lag in seconds (0 = caught up)
--   Last_Error             → last replication error (empty = healthy)
--   Executed_Gtid_Set      → GTIDs applied on this replica
 
-- On primary: connected replicas
SHOW PROCESSLIST;
-- Look for "Slave: has sent all binlog to slave" state
 
-- Binlog status:
SHOW BINARY LOGS;
SHOW MASTER STATUS;

Common Replication Errors

Error 1062: Duplicate entry

Last_Error: Could not execute Write_rows event on table testdb.messages;
Duplicate entry '42' for key 'PRIMARY'

Cause: A row was written directly to the replica (bypassing the primary) and now conflicts with incoming replication.

Fix:

-- Skip the failing event (temporary measure, investigate root cause):
STOP REPLICA;
SET GLOBAL SQL_REPLICA_SKIP_COUNTER = 1;
START REPLICA;
 
-- Permanent fix: make replicas read-only
SET GLOBAL read_only = 1;
SET GLOBAL super_read_only = 1;   -- prevents even SUPER users from writing

Error 1032: Row not found

Last_Error: Could not execute Update_rows event; Can't find record in 'orders'

Cause: The replica is missing a row that the primary expects to exist. Data divergence.

Fix: Use pt-table-checksum and pt-table-sync from Percona Toolkit to identify and fix diverged data.

IO Thread Stopped

Replica_IO_Running: No
Last_IO_Error: error connecting to master

Cause: Network partition, primary restart, or firewall change.

Fix: Usually self-recovers. If not:

STOP REPLICA;
START REPLICA;

Preparing Your Mental Model for Galera

After working through this post, you understand:

✅ Binlog events are the atom of MySQL replication
✅ Row-based format is required for Galera
✅ Async replication has a durability gap — committed data can be lost on failover
✅ Semi-sync closes the durability gap but not the lag gap, and can silently fall back
✅ GTID gives universal transaction position across topology changes
✅ Replication lag is real, measurable, and impacts read consistency
✅ Failover automation with async replication requires external tools and ~30s downtime windows

In the next post, we look at how Galera solves these problems by abandoning the binlog streaming model entirely in favor of write-set certification — a fundamentally different replication protocol.

Continue to Post 3 → Galera Cluster Architecture


Series navigation:
Post 1: MySQL HA Roadmap Overview
Post 3: Galera Cluster Architecture

📬 Subscribe to Newsletter

Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.

We respect your privacy. Unsubscribe at any time.

💬 Comments

Sign in to leave a comment

We'll never post without your permission.