Back to blog

Database Internals Roadmap: How Databases Work Under the Hood

databasedatabase-internalsstorage-enginedistributed-systemsroadmap
Database Internals Roadmap: How Databases Work Under the Hood

You've written thousands of SQL queries. You know how to create indexes, tune slow queries, and pick the right database for a project. But do you know what actually happens when you run INSERT INTO? Where does that row go? How does the database guarantee it won't be lost if the power cuts out mid-write?

Most developers treat databases as black boxes. You put data in, you get data out. That works — until it doesn't. Until you're debugging a mysterious deadlock at 2 AM, wondering why your index isn't being used, or trying to understand why your distributed system lost data despite "ACID guarantees."

This series cracks open the black box. We'll explore how databases store data on disk, manage memory, process transactions, execute queries, and coordinate across multiple nodes. By the end, you won't just use databases — you'll understand them.

Who Is This Roadmap For?

This series is for developers who:

  • Have at least 1–2 years of experience working with relational databases (PostgreSQL, MySQL, SQLite, etc.)
  • Can write SQL confidently but don't know what happens beneath the query layer
  • Want to understand why certain queries are slow, not just how to fix them
  • Are preparing for system design interviews where database internals come up frequently
  • Want to make informed decisions about database selection, schema design, and performance tuning
  • Are curious about how distributed databases (CockroachDB, TiDB, Cassandra) achieve consistency and availability

You don't need a computer science degree — but you should be comfortable with basic data structures (arrays, hash maps, trees) and have used at least one relational database in a real project.

What You'll Learn

Learning Outcomes:
✅ Understand how B-Trees and LSM-Trees store and retrieve data on disk
✅ Explain how buffer pools and page layouts manage memory efficiently
✅ Describe how Write-Ahead Logging (WAL) guarantees crash safety
✅ Implement mental models for ACID transactions and isolation levels
✅ Analyze query execution plans and understand the query optimizer's decisions
✅ Compare index types (B+Tree, Hash, GIN, GiST) and choose the right one
✅ Understand lock management, deadlock detection, and MVCC
✅ Reason about distributed storage: replication, partitioning, and consensus
✅ Evaluate trade-offs between different database architectures
✅ Debug database performance issues with confidence, not guesswork

Prerequisites

  • Solid SQL skills (JOINs, subqueries, transactions, indexes)
  • Familiarity with at least one RDBMS (PostgreSQL recommended, but MySQL/SQLite work too)
  • Basic understanding of data structures: arrays, linked lists, hash maps, binary trees
  • Comfort with reading pseudocode or simple code examples (we'll use SQL, Python, and Go)
  • Basic OS concepts helpful but not required (files, memory, processes)

Why Database Internals Matter

You might wonder: "I'm a backend developer, not a database engineer. Why should I care about page layouts and WAL?"

Here's why:

1. Performance Debugging Becomes Trivial

When you understand that a B+Tree index stores keys in sorted order with leaf nodes linked together, you instantly know why ORDER BY on an indexed column is fast and why LIKE '%search%' can't use an index. No more cargo-culting EXPLAIN ANALYZE output.

2. Schema Design Gets Intentional

Knowing how storage engines organize data means you design tables and indexes with the engine, not against it. You'll understand why wide rows hurt in PostgreSQL but are fine in Cassandra, and why covering indexes eliminate table lookups.

3. System Design Interviews Become Easier

Database internals questions are everywhere in senior-level interviews: "How would you design a system that handles 100K writes per second?" or "Explain the trade-offs between strong and eventual consistency." This series gives you the mental models to answer confidently.

4. You Stop Making Expensive Mistakes

Developers who don't understand transaction isolation levels ship code with phantom reads. Developers who don't understand WAL disable fsync for performance and lose data. Knowledge of internals prevents these costly mistakes.

The Architecture of a Database

Before diving into individual topics, let's see the big picture. Every relational database — PostgreSQL, MySQL, SQLite — shares a similar high-level architecture:

From top to bottom:

  1. Query Parser — Turns your SQL text into an Abstract Syntax Tree (AST)
  2. Query Optimizer — Finds the most efficient way to execute the query (which indexes to use, join order, etc.)
  3. Execution Engine — Actually runs the plan, fetching and transforming data
  4. Concurrency Control — Manages multiple concurrent transactions without conflicts
  5. Buffer Pool — Keeps frequently accessed disk pages in memory for speed
  6. Storage Engine — Organizes data on disk using data structures like B-Trees or LSM-Trees
  7. Disk Storage — The actual files on disk: data files, index files, and the Write-Ahead Log

This series walks through each layer, from the bottom (disk) to the top (query processing).

The Learning Path

This series is organized into 3 phases, progressing from how data lives on disk through transaction processing to distributed systems.

Phase 1: Storage Fundamentals (Weeks 1–3)

Goal: Understand how databases store, retrieve, and protect data on disk.

This is the foundation. You'll learn the two dominant storage engine designs (B-Trees and LSM-Trees), how databases organize data into pages and manage them in memory through buffer pools, and how Write-Ahead Logging guarantees that committed data is never lost — even during crashes.

Topics covered:

  • B-Tree structure: root, internal, and leaf nodes
  • LSM-Trees: memtable, SSTables, and compaction strategies
  • When to use B-Trees vs LSM-Trees (read-heavy vs write-heavy workloads)
  • Page layout: slotted pages, tuple headers, and free space management
  • Buffer pool: page replacement policies (LRU, clock), dirty page flushing
  • Write-Ahead Logging: log structure, checkpointing, crash recovery (ARIES)
  • fsync, durability guarantees, and the "torn page" problem

Posts in this phase:

📖 Post 2: Storage Engines — B-Trees vs LSM-Trees & How Data Lives on Disk
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: The two fundamental approaches to storing data on disk and when to choose each.

🔗 Start Post 2: Storage Engines →

📖 Post 3: Page Layout, Buffer Pool & Memory Management in Databases
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How databases organize data within pages and manage memory for optimal performance.

🔗 Start Post 3: Page Layout & Buffer Pool →

📖 Post 4: Write-Ahead Logging (WAL), Recovery & Crash Safety
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How databases guarantee durability and recover from crashes without losing data.

🔗 Start Post 4: WAL & Crash Recovery →

Phase 2: Transactions, Indexing & Query Processing (Weeks 4–7)

Goal: Master how databases ensure correctness, speed up queries, and process SQL efficiently.

This phase covers the "brain" of the database — transactions that keep data consistent, indexes that make queries fast, and the query engine that turns SQL into an execution plan. You'll understand isolation levels deeply enough to choose the right one for your workload, and you'll learn why the optimizer sometimes ignores your carefully created index.

Topics covered:

  • ACID properties in practice (not just theory)
  • Isolation levels: Read Uncommitted → Read Committed → Repeatable Read → Serializable
  • MVCC (Multi-Version Concurrency Control) — how PostgreSQL and MySQL implement snapshot isolation
  • B+Tree indexes: structure, range scans, and covering indexes
  • Hash indexes, GIN (for full-text search), GiST (for spatial data), and Bloom filters
  • Composite indexes, index-only scans, and partial indexes
  • Query parsing: SQL → AST → logical plan → physical plan
  • Join algorithms: nested loop, hash join, merge join
  • Cost-based optimization and statistics
  • Lock management: shared/exclusive locks, lock escalation, deadlock detection

Posts in this phase:

📖 Post 5: Transaction Processing — ACID, Isolation Levels & MVCC
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How databases maintain correctness across concurrent transactions.

🔗 Start Post 5: Transactions & MVCC →

📖 Post 6: Indexing Deep Dive — B+Tree, Hash, GIN, GiST & Bloom Filters
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How different index types work internally and when to use each one.

🔗 Start Post 6: Indexing Deep Dive →

📖 Post 7: Query Processing — Parsing, Planning & Execution Pipelines
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: The journey of a SQL query from text to results, and how the optimizer makes decisions.

🔗 Start Post 7: Query Processing →

📖 Post 8: Lock Management, Deadlocks & Concurrency Control
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How databases handle concurrent access and resolve conflicts.

🔗 Start Post 8: Locks & Concurrency →

Phase 3: Advanced Storage & Distributed Systems (Weeks 8–11)

Goal: Explore advanced storage designs and how databases scale across multiple machines.

The final phase ventures into the cutting edge. You'll understand how distributed databases replicate data, partition it across nodes, and reach consensus — the problems that define modern data infrastructure. You'll also explore log-structured storage in depth, column-oriented designs for analytics, and finally compare how real databases (PostgreSQL, MySQL, SQLite, RocksDB) implement these concepts differently.

Topics covered:

  • Replication: leader-follower, multi-leader, leaderless
  • Partitioning: range-based, hash-based, and consistent hashing
  • Consensus protocols: Raft, Paxos (simplified)
  • CAP theorem and its practical implications
  • SSTables, compaction strategies (size-tiered, leveled), and write amplification
  • Column-oriented storage: column families, compression, and vectorized execution
  • Analytical databases: how OLAP differs from OLTP at the storage level
  • Comparing architectures: PostgreSQL (heap + B-Tree), MySQL/InnoDB (clustered index), SQLite (single-file), RocksDB (LSM-Tree)

Posts in this phase:

📖 Post 9: Distributed Storage — Replication, Partitioning & Consensus
Time commitment: 7–10 days, 1–2 hours daily
What you'll learn: How databases scale across machines while maintaining consistency.

🔗 Start Post 9: Distributed Storage →

📖 Post 10: Log-Structured Storage — SSTables, Compaction & Leveled Merge
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: The design behind write-optimized storage engines used by Cassandra, RocksDB, and LevelDB.

🔗 Start Post 10: Log-Structured Storage →

📖 Post 11: Column-Oriented Storage, Compression & Analytical Databases
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How analytical databases store and process data differently from transactional ones.

🔗 Start Post 11: Column-Oriented Storage →

📖 Post 12: Comparing Database Architectures — PostgreSQL, MySQL, SQLite & RocksDB
Time commitment: 5–7 days, 1–2 hours daily
What you'll learn: How four popular databases implement the concepts from this series differently.

🔗 Start Post 12: Comparing Architectures →

Series Overview

Key Concepts at a Glance

Here are the most important concepts you'll encounter throughout this series:

Storage Layer

ConceptWhat It Means
B-TreeSelf-balancing tree that keeps data sorted — the default index in most RDBMS
LSM-TreeWrite-optimized structure that buffers writes in memory and flushes sorted runs to disk
PageThe smallest unit of I/O — typically 4KB or 8KB — that the database reads/writes at once
Buffer PoolIn-memory cache of disk pages — the most critical performance component
WALWrite-Ahead Log — all changes are logged before being applied, ensuring crash recovery

Transaction Layer

ConceptWhat It Means
ACIDAtomicity, Consistency, Isolation, Durability — the four guarantees of a transaction
MVCCMulti-Version Concurrency Control — readers don't block writers by keeping old versions
Isolation LevelsTrade-offs between correctness and performance for concurrent transactions
DeadlockTwo transactions waiting for each other's locks — requires detection and resolution

Query Layer

ConceptWhat It Means
Query PlanThe execution strategy chosen by the optimizer — visible via EXPLAIN
Cost-Based OptimizerChooses the plan with lowest estimated cost based on table statistics
Join AlgorithmsNested loop, hash join, merge join — each optimal for different data sizes
Index Scan vs Seq ScanUsing an index to find rows vs reading the entire table

Distributed Layer

ConceptWhat It Means
ReplicationCopying data across multiple nodes for durability and read scalability
PartitioningSplitting data across nodes so no single machine holds everything
ConsensusAgreement protocol (Raft, Paxos) ensuring all nodes agree on the order of operations
CAP TheoremYou can have at most 2 of 3: Consistency, Availability, Partition tolerance

This series is designed to be self-contained, but if you want to go deeper, these are the best references:

Books

BookBest For
Database Internals (Alex Petrov)The primary inspiration for this series — comprehensive and modern
Designing Data-Intensive Applications (Martin Kleppmann)The big picture of distributed data systems
PostgreSQL 14 Internals (Egor Rogov)Deep dive into PostgreSQL's specific implementation
Database System Concepts (Silberschatz et al.)Academic textbook — thorough but dense

Online Resources

  • PostgreSQL documentation — Among the best database documentation ever written
  • Use The Index, Luke — Excellent visual guide to SQL indexing
  • CMU Database Group (YouTube) — Andy Pavlo's lecture series on database systems
  • PGAnalyze Blog — Practical PostgreSQL performance insights

How to Get the Most Out of This Series

  1. Follow along with a real database. Install PostgreSQL locally and run the example queries. Nothing beats hands-on experience.

  2. Use EXPLAIN ANALYZE constantly. After learning about storage engines and indexes, every query becomes an opportunity to practice.

  3. Read the posts in order. Each post builds on concepts from previous ones. Skipping the storage fundamentals will make transactions and query processing harder to understand.

  4. Don't memorize — build mental models. The goal isn't to remember every detail about B-Tree node splitting. It's to develop intuition: "This query is slow because it's doing a sequential scan on a large table when it should be using the index on column X."

  5. Connect to your daily work. Every time you write a query, create an index, or choose an isolation level, think about what's happening beneath the surface.

Summary

Databases are among the most impressive pieces of software ever built. They solve problems that most applications never have to think about: crash recovery, concurrent access, efficient disk I/O, and distributed consensus. Understanding how they work doesn't just make you a better database user — it makes you a better engineer.

This series takes you from the physical layer (how bytes are organized on disk) through the logical layer (transactions, indexes, queries) to the distributed layer (replication, partitioning, consensus). By the end, you'll have a complete mental model of what happens between your SELECT statement and the data arriving at your application.

The difference between a developer who uses a database and one who understands it is the difference between driving a car and knowing how the engine works. Both can get you where you need to go — but only one can diagnose the problem when something breaks.

Let's pop the hood.

Series: Database Internals Roadmap
Next: Storage Engines: B-Trees vs LSM-Trees →

📬 Subscribe to Newsletter

Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.

We respect your privacy. Unsubscribe at any time.

💬 Comments

Sign in to leave a comment

We'll never post without your permission.