Redis Persistence Internals: How RDB and AOF Work

Introduction
Every time you restart a Redis server, you expect your data to still be there. But Redis is an in-memory database — when the process exits, RAM is wiped. So how does Redis survive a restart?
The answer is persistence: Redis writes data to disk in the background, so it can reload on startup. It offers two complementary mechanisms:
- RDB (Redis Database) — periodic snapshots of the entire dataset
- AOF (Append-Only File) — a write-ahead log of every mutating command
Both are implemented in the Redis source code with careful engineering to avoid blocking the event loop or introducing latency spikes. This post reads through rdb.c, aof.c, and supporting files to understand exactly how they work.
What You'll Learn:
✅ How Redis forks its process to take a snapshot without pausing
✅ What copy-on-write means and why it makes BGSAVE safe
✅ How AOF records commands and the three fsync strategies
✅ The dual-buffer trick that prevents AOF rewrite from losing data
✅ How Redis combines RDB and AOF for the best of both worlds
Prerequisites:
- Basic Redis knowledge (Learning Redis Fundamentals recommended)
- Familiarity with Redis internals is helpful (Redis Source Code Explained)
- Curiosity about operating systems (fork, fsync) helps but is not required
The Two Persistence Files
git clone https://github.com/redis/redis.git
cd redis/src| File | What It Does |
|---|---|
rdb.c | RDB snapshot creation, encoding, and loading |
rdb.h | RDB format constants and function declarations |
aof.c | AOF write, fsync, and background rewrite |
server.c | serverCron — the timer that triggers both |
server.h | redisServer fields for persistence state |
The persistence state in redisServer (from server.h) is worth noting upfront:
struct redisServer {
// RDB state
pid_t rdb_child_pid; // PID of background save process (-1 if none)
int rdb_bgsave_scheduled; // BGSAVE requested but blocked by AOF rewrite
time_t rdb_last_save; // Unix timestamp of last successful save
int rdb_last_bgsave_status; // C_OK or C_ERR
long long dirty; // Changes since last RDB save
long long dirty_before_bgsave;// dirty count when BGSAVE started
// AOF state
int aof_state; // AOF_OFF, AOF_ON, or AOF_WAIT_REWRITE
int aof_fd; // File descriptor of the AOF file
pid_t aof_child_pid; // PID of background rewrite process
sds aof_buf; // In-memory buffer: commands waiting to be flushed
sds aof_rewrite_buf_blocks; // Secondary buffer: commands during rewrite
off_t aof_current_size; // Current AOF file size in bytes
};Two child PIDs, two buffers, a dirty counter — already you can see the shape of how the system works. Let's explore each side.
Part 1: RDB — Snapshots with fork()
What an RDB File Is
An RDB file is a compact binary snapshot of the entire Redis dataset at a point in time. It contains every key, every value, every TTL, encoded in Redis's own binary format. On startup, Redis loads this file and reconstructs the in-memory state.
You can trigger a save manually:
SAVE # Synchronous — blocks Redis until done (avoid in production)
BGSAVE # Asynchronous — forks a child process, returns immediatelyYou can also configure automatic saves in redis.conf:
# Save if at least 1 key changed in the last 900 seconds
save 900 1
# Save if at least 10 keys changed in the last 300 seconds
save 300 10
# Save if at least 10000 keys changed in the last 60 seconds
save 60 10000These thresholds are checked by serverCron, which runs every 100ms.
The BGSAVE Implementation
The core of non-blocking snapshot creation is rdbSaveBackground() in rdb.c:
int rdbSaveBackground(int req, char *filename,
rdbSaveInfo *rsi, int rdbflags) {
pid_t childpid;
// Can't run two background saves simultaneously
if (hasActiveChildProcess()) return C_ERR;
server.dirty_before_bgsave = server.dirty;
server.lastbgsave_try = time(NULL);
// *** THE KEY OPERATION ***
if ((childpid = redisFork(CHILD_TYPE_RDB)) == 0) {
// === Child process ===
int retval;
redisSetProcTitle("redis-rdb-bgsave");
redisSetCpuAffinity(server.bgsave_cpulist);
retval = rdbSave(req, filename, rsi, rdbflags);
if (retval == C_OK) {
sendChildCowInfo(CHILD_INFO_TYPE_RDB_COW_SIZE,
"RDB");
}
exitFromChild((retval == C_OK) ? 0 : 1);
} else {
// === Parent process (the main Redis server) ===
if (childpid == -1) {
// fork() failed
server.lastbgsave_status = C_ERR;
return C_ERR;
}
serverLog(LL_NOTICE,
"Background saving started by pid %ld",
(long) childpid);
server.rdb_save_time_start = time(NULL);
server.rdb_child_pid = childpid;
return C_OK;
}
return C_OK; // unreachable in parent
}The key insight is the fork() call. fork() creates an exact copy of the Redis process. The child writes the snapshot; the parent continues serving commands. They share the same memory pages — at zero copy cost initially.
Copy-on-Write: Why Fork Is (Almost) Free
After fork(), both the parent and child process point to the same physical memory pages. The OS marks all these pages as copy-on-write (COW):
When the parent modifies a key (because clients are still writing), the OS transparently duplicates just that page. The child still sees the old, unmodified version. This means:
- The child gets a consistent point-in-time view of the data
- The parent never blocks — it keeps serving requests normally
- Memory overhead is proportional to how much data changes during the save, not the total dataset size
This is why Redis logs RDB: X MB of memory used by copy-on-write after a save — it's telling you how many pages were duplicated.
The Child: Serializing the Dataset
Inside the child process, rdbSave() iterates over all databases and serializes every key-value pair:
int rdbSave(int req, char *filename,
rdbSaveInfo *rsi, int rdbflags) {
// Write to a temp file first, then atomic rename
snprintf(tmpfile, 256, "temp-%d.rdb", (int) getpid());
fp = fopen(tmpfile, "w");
rioInitWithFile(&rdb, fp);
if (rdbSaveRio(req, &rdb, &error, rdbflags, rsi) == C_ERR) {
// Error handling...
}
// Flush OS buffer to disk
if (fflush(fp) == EOF) goto werr;
if (fsync(fileno(fp)) == -1) goto werr;
if (fclose(fp) == EOF) goto werr;
// Atomic rename: old dump.rdb is replaced in one syscall
if (rename(tmpfile, filename) == -1) goto werr;
return C_OK;
}Two reliability details stand out:
- Write to a temp file first. If the process crashes mid-write, the existing
dump.rdbis untouched. An incomplete file never becomes the live snapshot. rename()is atomic. On POSIX systems, renaming a file is a single syscall. There's no window where a reader could see a half-written file.
The RDB Binary Format
The rdbSaveRio() function writes the data in a custom binary format. A simplified view:
[REDIS][version][aux fields]
For each database:
[SELECTDB][db number]
[RESIZE_DB][key count][expire count]
For each key:
[optional: EXPIRETIME ms][unix timestamp]
[type byte]
[encoded key]
[encoded value]
[EOF]
[8-byte CRC64 checksum]The encoding is type-specific and compact. Integers are stored as variable-length integers (not ASCII), strings use length-prefixed bytes, and complex types like sorted sets use their own binary representations. The final CRC64 detects corruption.
When the Child Finishes
The parent detects child completion in serverCron via wait3() (a non-blocking child status check). If the child exited with status 0 (success):
// In server.c, serverCron():
if (pid == server.rdb_child_pid) {
backgroundSaveDoneHandler(exitcode, bysignal);
}backgroundSaveDoneHandler updates server.rdb_last_save, clears server.dirty (the change counter), and logs the save time. If it failed, Redis logs the error and schedules a retry.
Part 2: AOF — Write-Ahead Logging
RDB snapshots are efficient but have a gap: if Redis crashes 50 seconds after a snapshot, you lose 50 seconds of writes. The AOF closes this gap by recording every mutating command to a log file.
How AOF Works at the Call Site
Every command that modifies data calls propagate() after execution. Inside propagate(), feedAppendOnlyFile() is called:
// Simplified from aof.c
void feedAppendOnlyFile(int dictid, robj **argv, int argc) {
sds buf = sdsempty();
// If the client is on a different database, emit SELECT first
if (server.aof_selected_db != dictid) {
char seldb[64];
snprintf(seldb, sizeof(seldb), "%d", dictid);
buf = sdscatprintf(buf, "*2\r\n$6\r\nSELECT\r\n$%lu\r\n%s\r\n",
(unsigned long)strlen(seldb), seldb);
server.aof_selected_db = dictid;
}
// Translate relative expiries to absolute timestamps
// (same reason as in replication: avoid drift)
if (argc == 3 && !strcasecmp(argv[0]->ptr, "set")) {
// Rewrite SET key value EX 60
// to: SET key value PXAT <absolute_ms>
}
// Encode the command in RESP format and append to buffer
buf = catAppendOnlyGenericCommand(buf, argc, argv);
server.aof_buf = sdscatlen(server.aof_buf, buf, sdslen(buf));
sdsfree(buf);
}The command is serialized in RESP format (the same protocol Redis uses for client communication) and appended to server.aof_buf — an in-memory buffer. It does NOT immediately write to disk.
The Three fsync Strategies
Writing to aof_buf is fast, but durability requires the data to reach the disk. This happens in flushAppendOnlyFile(), called from the event loop's "before sleep" hook:
void flushAppendOnlyFile(int force) {
ssize_t nwritten;
if (sdslen(server.aof_buf) == 0) {
// Nothing to flush, but still maybe fsync on a timer
if (server.aof_fsync == AOF_FSYNC_EVERYSEC && ...) {
goto try_fsync;
}
return;
}
// Write the buffer to the file descriptor
nwritten = aofWrite(server.aof_fd,
server.aof_buf,
sdslen(server.aof_buf));
// Truncate the buffer (data is now in OS page cache)
sdsclear(server.aof_buf);
try_fsync:
if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
// Flush OS page cache to physical disk NOW
// Guarantees durability: max 1 command lost on crash
// Cost: one fsync per command (very slow on spinning disk)
redis_fsync(server.aof_fd);
} else if (server.aof_fsync == AOF_FSYNC_EVERYSEC) {
// Delegate fsync to a background thread, once per second
// Max 1 second of data lost on crash
// Cost: ~1 fsync per second (the practical default)
if (server.aof_last_fsync < now) {
aof_background_fsync(server.aof_fd);
}
} else {
// AOF_FSYNC_NO: never call fsync explicitly
// OS decides when to flush (typically every 30 seconds)
// Max ~30 seconds of data lost on crash
// Cost: fastest, but lowest durability guarantee
}
}The three strategies represent a durability vs. performance trade-off:
| Strategy | Max data loss | Performance | Use case |
|---|---|---|---|
always | 0 commands | Slow (fsync per write) | Financial, critical data |
everysec | ~1 second | Fast (default) | Most applications |
no | OS-dependent | Fastest | Cache-only, data is reproducible |
The write() syscall moves data from aof_buf into the OS kernel's page cache — this is fast but not durable. fsync() forces the OS to flush the page cache to the physical disk. The gap between these two calls is the window where a crash can lose data.
AOF Rewrite: Compacting the Log
AOF files grow forever — every INCR adds a line, even if the key was incremented a million times. An AOF for a counter that went from 0 to 1,000,000 would have a million lines, even though the current state is just SET counter 1000000.
The solution is BGREWRITEAOF, which creates a new minimal AOF representing the current state:
BGREWRITEAOF # Trigger manually
# Redis also does this automatically based on aof-rewrite-min-size
# and aof-rewrite-percentage config valuesThe implementation in aofRewriteBackground() again uses fork():
int aofRewriteBackground(void) {
pid_t childpid;
if (hasActiveChildProcess()) return C_ERR;
if ((childpid = redisFork(CHILD_TYPE_AOF)) == 0) {
// === Child process ===
char tmpfile[256];
snprintf(tmpfile, 256, "temp-rewriteaof-bg-%d.aof",
(int) getpid());
if (rewriteAppendOnlyFile(tmpfile) == C_OK) {
sendChildCowInfo(CHILD_INFO_TYPE_AOF_COW_SIZE, "AOF rewrite");
exitFromChild(0);
} else {
exitFromChild(1);
}
} else {
// === Parent process ===
server.aof_child_pid = childpid;
// Start accumulating commands into the secondary buffer
aofRewriteBufferReset();
server.aof_rewrite_buf_blocks = listCreate();
return C_OK;
}
}The Dual-Buffer Problem
Here is the most subtle engineering challenge in AOF: what happens to writes that arrive while the rewrite is in progress?
The child has a point-in-time snapshot (via COW), but the parent keeps serving client writes. If those new writes go only to aof_buf (the main AOF file), the new compact AOF won't include them. When the rewrite finishes and we swap to the new file, those writes would be lost.
Redis solves this with a secondary buffer: aof_rewrite_buf_blocks. While rewrite is running, every new mutating command is written to both aof_buf (the current AOF file) and aof_rewrite_buf_blocks (the accumulation buffer):
void feedAppendOnlyFile(int dictid, robj **argv, int argc) {
// ... encode command into buf ...
// Always append to main AOF buffer
server.aof_buf = sdscatlen(server.aof_buf, buf, sdslen(buf));
// ALSO append to rewrite buffer if rewrite is in progress
if (server.aof_child_pid != -1)
aofRewriteBufferAppend((unsigned char*)buf, sdslen(buf));
}When the child finishes and signals the parent:
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
if (!bysignal && exitcode == 0) {
// 1. Open the new compact AOF file
newfd = open(tmpfile, O_WRONLY|O_APPEND);
// 2. Append the accumulated secondary buffer to it
// (all writes that arrived during rewrite)
if (aofRewriteBufferWrite(newfd) == -1) { ... }
// 3. Atomic rename: new file replaces old
rename(tmpfile, server.aof_filename);
// 4. Switch file descriptor to the new file
oldfd = server.aof_fd;
server.aof_fd = newfd;
close(oldfd); // This can be slow, but happens after rename
}
}This guarantees no writes are lost. The secondary buffer is the bridge between the child's frozen snapshot and the parent's live write stream.
Part 3: Loading on Startup
When Redis starts, loadDataFromDisk() in server.c decides what to load:
void loadDataFromDisk(void) {
long long start = ustime();
if (server.aof_state == AOF_ON) {
// AOF takes priority — it's more complete
if (loadAppendOnlyFiles(server.aof_manifest) == AOF_FAILED) {
exit(1);
}
} else {
// No AOF, try RDB
rdbSaveInfo rsi = RDB_SAVE_INFO_INIT;
errno = 0;
int rdb_flags = RDBFLAGS_MAIN_FILE;
if (rdbLoad(server.rdb_filename, &rsi, rdb_flags) == C_OK) {
serverLog(LL_NOTICE, "DB loaded from disk: %.3f seconds",
(float)(ustime()-start)/1000000);
} else if (errno != ENOENT) {
serverLog(LL_WARNING, "Fatal error loading RDB file: %s. Exiting.",
server.rdb_filename);
exit(1);
}
}
}AOF has priority because it is more up-to-date. RDB is loaded only when AOF is disabled or unavailable.
The AOF loader (loadSingleAppendOnlyFile()) replays commands by creating a fake client and executing each command as if it arrived from the network — using the exact same command dispatch path as normal operation. This reuse of the command table means AOF loading benefits from any future command optimizations automatically.
Part 4: RDB + AOF Combined (Hybrid Persistence)
Redis 4.0 introduced a hybrid mode that combines both:
# redis.conf
aof-use-rdb-preamble yesWhen BGREWRITEAOF runs in hybrid mode, the child writes an RDB-format preamble (fast binary encoding) followed by AOF-format commands for writes that happened after the snapshot:
[RDB binary data — compact, fast to load]
[AOF commands — only what changed since the RDB preamble]This means:
- Load time is as fast as RDB (bulk binary decoding, not command replay)
- Data loss is as small as AOF (at most 1 second with
everysec) - File size is smaller than a plain AOF after many small writes
The loader detects the preamble by checking if the file starts with the RDB magic bytes (REDIS). If it does, it loads the RDB portion, then switches to AOF replay for the tail.
Putting It Together
Here's the full picture of what happens during Redis's life:
Key Design Principles
1. Never block the event loop for disk I/O
Both BGSAVE and BGREWRITEAOF use fork() to move all disk work to a child process. The parent returns immediately and continues serving requests.
2. fork() + copy-on-write makes snapshots cheap
The OS shares memory pages between parent and child until one of them writes. A snapshot of a 4 GB dataset takes microseconds to initiate — not gigabytes of copying.
3. Write to temp file, rename atomically
Both RDB and AOF rewrite use the temp-file-then-rename pattern. A crash mid-write never corrupts the existing file.
4. The secondary buffer bridges two timelines
During AOF rewrite, new writes go to both the live AOF and a secondary buffer. When the rewrite finishes, the buffer is appended to the new file, ensuring no commands fall through the gap.
5. AOF replays commands through the normal dispatch path
Startup loading creates a fake client and calls the same setCommand, incrCommand, etc. that production traffic uses. No special loader logic is needed.
How to Explore the Code Yourself
# Key functions to read in order:
grep -n "rdbSaveBackground" src/rdb.c # BGSAVE entry point
grep -n "rdbSave\b" src/rdb.c # The child's work
grep -n "feedAppendOnlyFile" src/aof.c # Where commands are logged
grep -n "flushAppendOnlyFile" src/aof.c # The fsync decision
grep -n "aofRewriteBackground" src/aof.c # BGREWRITEAOF entry point
grep -n "backgroundRewriteDoneHandler" src/aof.c # Secondary buffer merge
grep -n "loadDataFromDisk" src/server.c # Startup loading logicTo see what's in an RDB file:
# Redis ships a tool for this
src/redis-check-rdb dump.rdb
# Or use redis-cli
redis-cli --rdb /path/to/dump.rdbTo see what's in an AOF file — it's human-readable RESP:
cat appendonly.aof | head -50
# You'll see:
# *3
# $3
# SET
# $6
# mykey
# $5
# helloSummary
Redis persistence is a study in avoiding the obvious solutions:
✅ Instead of pausing to snapshot, Redis fork()s and uses copy-on-write
✅ Instead of risking corrupt files, Redis writes to temp files and renames atomically
✅ Instead of losing writes during AOF rewrite, Redis accumulates them in a secondary buffer
✅ Instead of special loading code, AOF replay reuses the normal command dispatch path
✅ Instead of choosing between RDB and AOF, hybrid mode combines both
The next time you configure save 300 10 or appendfsync everysec, you'll know exactly what code runs, why those defaults exist, and what trade-off you're making.
Additional Resources
Redis Source Code:
- redis/src/rdb.c — Snapshot implementation
- redis/src/aof.c — Append-only file
- redis/src/server.c —
serverCronandloadDataFromDisk
Related Posts:
- Redis Source Code Explained — Event loop, dict, and string internals
- Learning Redis: The Complete Beginner's Guide — Redis commands and concepts
- Spring Boot Caching with Redis — Practical Redis integration
📬 Subscribe to Newsletter
Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.
We respect your privacy. Unsubscribe at any time.
💬 Comments
Sign in to leave a comment
We'll never post without your permission.