SQL & NoSQL Database Roadmap: Master Data Storage

Introduction

Every application needs to store data. The question is never whether you need a database — it's which database, how to query it efficiently, and how to design a schema that won't collapse under real-world load.

Most developers learn SQL by copying queries from Stack Overflow and call it done. They never understand why an index speeds up a query, what isolation levels actually prevent, or when MongoDB genuinely outperforms PostgreSQL. That gap shows up the moment a query slows down in production, data gets corrupted, or a design decision made early becomes impossible to undo.

This roadmap closes that gap. Twelve posts, three phases, covering SQL fundamentals through NoSQL architecture through the cross-cutting concerns that apply everywhere. By the end, you'll make confident database decisions and write queries that scale.

Why Databases Are Worth Mastering

Every backend job requires it
Whether you write Java, Python, Go, or TypeScript, you interact with databases every day. Understanding them deeply makes you faster, more confident, and more valuable.

Performance problems are database problems
In production applications, slow queries are the most common performance bottleneck. An N+1 query problem or a missing index can make a fast application grind to a halt.

Data is the most valuable asset
Code can be rewritten. Data cannot be regenerated. Bad schema decisions, missing constraints, and wrong isolation levels can corrupt data in ways that are painful or impossible to fix.

SQL is universal
SQL has been around since the 1970s and shows no signs of going away. PostgreSQL, MySQL, SQLite, BigQuery, Redshift, Snowflake — all use SQL. It's one of the highest-ROI skills you can develop.

NoSQL is situational, not universal
MongoDB, Redis, Cassandra, DynamoDB each excel in specific scenarios. Knowing when and why to use them — rather than using them by default or avoiding them entirely — is the mark of a mature engineer.

What You'll Build

By the end of this roadmap, you'll be able to:

✅ Write complex SQL queries with JOINs, subqueries, CTEs, and window functions
✅ Design schemas that enforce data integrity and perform well at scale
✅ Understand what PostgreSQL does when you run a query
✅ Model data in MongoDB for document-based access patterns
✅ Use Redis for caching, sessions, rate limiting, and pub/sub
✅ Create indexes that actually improve query performance
✅ Understand transactions, ACID, and how concurrency goes wrong
✅ Choose between SQL and NoSQL based on access patterns, not hype

Prerequisites

This roadmap assumes:

✅ Programming experience in at least one backend language (Python, Java, Go, TypeScript, etc.)
✅ Basic understanding of what databases are (tables, rows, queries — even if vague)
✅ Familiarity with running commands in a terminal
✅ Time commitment — 8-10 hours per week for 10-12 weeks

If you want additional foundation, these posts from this blog complement the roadmap:

Relational Database Fundamentals — how databases work internally
Database Schema Design Guide — normalization, ERDs, design patterns

Learning Path Overview

This roadmap spans 3 phases across 12 posts:

DB-1 (Roadmap & Overview) ← You are here
  │
  ├─ Phase 1: SQL Fundamentals (Posts 2-3)
  │   ├─ DB-2: SQL Fundamentals (SELECT, JOIN, Subqueries)
  │   └─ DB-3: Advanced SQL (Window Functions, CTEs, Recursive Queries)
  │
  ├─ Phase 2: NoSQL Fundamentals (Posts 4-6)
  │   ├─ DB-4: NoSQL Fundamentals (Document, Key-Value, Column, Graph)
  │   ├─ DB-5: Deep Dive: PostgreSQL Mastery
  │   └─ DB-6: Deep Dive: MongoDB Complete Guide
  │
  └─ Phase 3: Cross-Cutting Concerns (Posts 7-12)
      ├─ DB-7: Deep Dive: Redis Beyond Caching
      ├─ DB-8: Deep Dive: Indexing & Query Performance
      ├─ DB-9: Deep Dive: Transactions & Concurrency Control
      ├─ DB-10: Deep Dive: Schema Design Patterns
      ├─ DB-11: Deep Dive: Database Migrations & Versioning
      └─ DB-12: Deep Dive: SQL vs NoSQL — Choosing the Right Database

Phase 1: SQL Fundamentals

SQL is the language of relational databases. It's been standardized, battle-tested for 50 years, and runs on billions of systems. Learning it well pays dividends for your entire career.

DB-2: SQL Fundamentals — SELECT, JOIN, Subqueries

Goal: Write queries that retrieve exactly the data you need

What You'll Learn:

The anatomy of a SELECT statement
Filtering with WHERE, ORDER BY, GROUP BY, HAVING
INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN — with diagrams
Aggregation functions: COUNT, SUM, AVG, MIN, MAX
Subqueries in WHERE and FROM clauses
The execution order SQL actually uses (not the order you write it)
Common query mistakes that silently return wrong results

Practical Examples:

E-commerce: find top customers by revenue this month
Blog: get all posts with their author names and comment counts
Analytics: calculate week-over-week growth rates

Why This Matters: Every database interaction you write uses these foundations. JOINs and aggregations appear in 90% of real-world queries. Getting them right from the start prevents a class of bugs that are hard to spot.

🔗 Coming Soon: SQL Fundamentals — SELECT, JOIN, Subqueries →

DB-3: Advanced SQL — Window Functions, CTEs, Recursive Queries

Goal: Write sophisticated queries that replace complex application-level code

What You'll Learn:

Common Table Expressions (CTEs) — readable, reusable query building blocks
Window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG()
PARTITION BY — running calculations per group
Recursive CTEs — traversing hierarchies (categories, org charts, threaded comments)
CASE WHEN — conditional logic inside queries
Lateral joins — per-row subqueries
EXPLAIN ANALYZE — reading query execution plans

Practical Examples:

Rank products by sales within each category
Calculate 7-day rolling averages for time-series data
Find the top N items per group without subqueries
Traverse a category tree of arbitrary depth

Why This Matters: Window functions and CTEs replace a whole class of "I'll just do this in application code" decisions. Pulling less data from the database and doing less work in your app is almost always faster.

🔗 Coming Soon: Advanced SQL — Window Functions & CTEs →

Phase 2: NoSQL Fundamentals

NoSQL databases don't replace SQL — they cover access patterns that relational databases handle poorly. Understanding when and why to use each type is the core skill.

DB-4: NoSQL Fundamentals — Document, Key-Value, Column, Graph

Goal: Understand the NoSQL landscape and when each type excels

What You'll Learn:

The four main NoSQL categories and what problems each solves
Document stores (MongoDB, Firestore) — flexible schemas, nested data
Key-value stores (Redis, DynamoDB) — blazing-fast lookups by key
Wide-column stores (Cassandra, HBase) — time-series, event data at massive scale
Graph databases (Neo4j, Amazon Neptune) — relationship-heavy data
BASE vs ACID: what eventual consistency actually means in practice
CAP theorem — why you can't have everything

The Trade-off Framework:

Type	Best For	Trade-off
Document	Flexible schemas, nested data	Harder to query across documents
Key-Value	High-speed lookups, sessions	No complex queries
Wide-Column	Write-heavy, time-series	Complex data modeling
Graph	Relationship traversal	Niche use cases
Relational	Structured, consistent data	Schema rigidity, joins at scale

Why This Matters: Most developers pick MongoDB because "everyone uses it" or avoid NoSQL entirely out of unfamiliarity. Neither is a strategy. This post gives you the mental framework to make informed choices.

🔗 Coming Soon: NoSQL Fundamentals — Document, Key-Value, Column, Graph →

DB-5: Deep Dive — PostgreSQL Mastery

Goal: Go beyond basic SQL to use PostgreSQL's full power

What You'll Learn:

PostgreSQL architecture: how it handles connections, memory, disk I/O
Data types you should know: JSONB, UUID, arrays, ranges, ENUM
Full-text search without Elasticsearch
Partial indexes, expression indexes, covering indexes
Row-level security (RLS) for multi-tenant applications
Stored procedures and triggers — when they help, when they hurt
pg_stat_statements — finding slow queries in production
Connection pooling with PgBouncer
Logical replication and read replicas

Practical Examples:

Store and query semi-structured data with JSONB
Implement multi-tenant data isolation with RLS
Set up full-text search with ts_vector and ts_query
Diagnose a slow query end-to-end

Why This Matters: PostgreSQL is arguably the most capable open-source database in existence. Most teams use 20% of its features. This post unlocks the other 80%.

🔗 Coming Soon: PostgreSQL Mastery →

DB-6: Deep Dive — MongoDB Complete Guide

Goal: Model data effectively in MongoDB and query it efficiently

What You'll Learn:

MongoDB data model: documents, collections, embedding vs referencing
When to embed and when to reference — the real rules
CRUD operations in depth
Aggregation pipeline — MongoDB's equivalent of SQL GROUP BY + JOIN
Schema design patterns: bucket pattern, outlier pattern, computed pattern
Indexing in MongoDB: single, compound, multikey, text, geospatial
Transactions in MongoDB (yes, they exist)
Change streams — real-time data feeds
Atlas Search for full-text search

Embedding vs Referencing:

Embed when:
  - Data is always accessed together
  - Child data is small and bounded
  - One-to-few relationship
 
Reference when:
  - Data is accessed independently
  - Many-to-many relationships
  - Data grows unbounded

Why This Matters: MongoDB's flexibility is a double-edged sword. Good document design leads to fast, simple queries. Bad design leads to massive aggregation pipelines and poor performance.

🔗 Coming Soon: MongoDB Complete Guide →

Phase 3: Cross-Cutting Concerns

These topics apply across all database systems. They're often the difference between a database that works and one that scales.

DB-7: Deep Dive — Redis Beyond Caching

Goal: Use Redis for its full range of capabilities, not just caching

What You'll Learn:

Redis data structures: Strings, Lists, Sets, Sorted Sets, Hashes, Streams
Caching patterns: cache-aside, write-through, write-behind
Session storage and its trade-offs vs. database sessions
Rate limiting with Redis and sliding window counters
Distributed locking with RedLock
Pub/Sub and its limitations
Redis Streams for event sourcing
Redis persistence: RDB snapshots vs AOF logging
Redis Sentinel and Redis Cluster for high availability

Practical Patterns:

Rate Limiting:
  INCR requests:{user_id}:{minute}
  EXPIRE requests:{user_id}:{minute} 60
 
Leaderboard:
  ZADD leaderboard {score} {user_id}
  ZREVRANGE leaderboard 0 9 WITHSCORES

Why This Matters: Redis used purely as a cache wastes its capabilities. Understanding the full data model unlocks patterns that would require separate services if implemented elsewhere.

🔗 Coming Soon: Redis Beyond Caching →

DB-8: Deep Dive — Indexing & Query Performance

Goal: Make slow queries fast and understand why indexes work

What You'll Learn:

How B-tree indexes work internally (the actual data structure)
When indexes help and when they hurt (write amplification)
Composite indexes and the left-prefix rule
Covering indexes — queries that never touch the table
Index selectivity and cardinality
Partial indexes for filtered queries
EXPLAIN and EXPLAIN ANALYZE — reading execution plans
Common query performance problems: sequential scans, sort spills, hash joins
Index bloat and maintenance (REINDEX, VACUUM)

The Index Decision Framework:

Add an index when:
  - Column appears in WHERE, JOIN ON, or ORDER BY
  - Query returns < 5-10% of rows (high selectivity)
  - Table has more reads than writes
 
Don't add an index when:
  - Table is tiny (< a few thousand rows)
  - Column has low cardinality (boolean, status with few values)
  - Table is write-heavy (index maintenance cost exceeds benefit)

Why This Matters: Indexing is the highest-leverage optimization available. A single index on the right column can turn a 10-second query into a 10-millisecond one. Understanding the trade-offs prevents the opposite: over-indexing that slows down writes.

🔗 Coming Soon: Indexing & Query Performance →

DB-9: Deep Dive — Database Transactions & Concurrency Control

Goal: Understand what ACID really means and how concurrency goes wrong

What You'll Learn:

ACID properties in depth: atomicity, consistency, isolation, durability
What actually happens inside a transaction
Concurrency anomalies: dirty reads, non-repeatable reads, phantom reads
Isolation levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable
Optimistic vs pessimistic locking
SELECT FOR UPDATE and SELECT FOR SHARE
Deadlocks: what causes them and how to prevent them
MVCC (Multi-Version Concurrency Control) — how PostgreSQL avoids locks
Long-running transactions and their danger

Isolation Level Matrix:

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	Prevented	Possible	Possible
Repeatable Read	Prevented	Prevented	Possible
Serializable	Prevented	Prevented	Prevented

Why This Matters: Most application bugs are not logic bugs — they're concurrency bugs. Race conditions, double-spend issues, and data corruption usually come from wrong isolation assumptions. This post gives you the mental model to reason about them correctly.

🔗 Coming Soon: Database Transactions & Concurrency Control →

DB-10: Deep Dive — Schema Design Patterns

Goal: Design schemas that age well under changing requirements and growing load

What You'll Learn:

Normalization revisited: when to normalize and when to denormalize
Polymorphic associations — one table for multiple entity types
The entity-attribute-value (EAV) anti-pattern and better alternatives
Soft deletes, audit trails, and temporal data patterns
JSONB columns as an escape hatch for flexible attributes
Schema design for multi-tenancy: shared schema, row-level, schema-per-tenant
Time-series data modeling in PostgreSQL
Event sourcing schema patterns
Naming conventions and their importance

The Normalization Rule of Thumb:

Normalize until it hurts (performance),
then denormalize until it works.

Why This Matters: Schema decisions made early are extremely hard to change later. A table with 500 million rows cannot be easily restructured. Getting the design right — and knowing which trade-offs you're accepting — prevents painful migrations later.

🔗 Coming Soon: Schema Design Patterns →

DB-11: Deep Dive — Database Migrations & Versioning

Goal: Change database schemas safely without downtime or data loss

What You'll Learn:

Why migrations fail and how to prevent failures
Zero-downtime migration patterns for production systems
Adding columns, dropping columns, renaming columns safely
Backfilling large tables without locking
Migration tools: Flyway, Liquibase, Alembic, Prisma Migrate, golang-migrate
Rolling migrations for blue-green deployments
Schema version control in Git
Testing migrations before production
The "expand-contract" pattern for backward-compatible changes

The Expand-Contract Pattern:

Phase 1 - Expand:  Add new_column (nullable)
Deploy application version that writes both old_column and new_column
 
Phase 2 - Migrate: Backfill new_column for existing rows
 
Phase 3 - Contract: Remove old_column after all reads use new_column

Why This Matters: "Just ALTER TABLE in production" works fine on small tables. At scale, schema changes become high-stakes operations. Doing them wrong causes downtime, locks that cascade into outages, or data loss.

🔗 Coming Soon: Database Migrations & Versioning →

DB-12: Deep Dive — SQL vs NoSQL: Choosing the Right Database

Goal: Make confident, justified database decisions for any project

What You'll Learn:

A decision framework based on access patterns, not popularity
When SQL wins and why (consistency, complex queries, joins)
When NoSQL wins and why (scale, flexibility, specific data models)
Polyglot persistence — using multiple databases in one system
Real-world case studies: what Uber, Instagram, GitHub chose and why
Database selection for common application types (e-commerce, SaaS, analytics, IoT)
Managed cloud databases: RDS, Cloud SQL, Atlas, Upstash, PlanetScale, Supabase
The cost of switching (migration complexity, operational overhead)
Questions to ask before choosing a database

The Quick Decision Guide:

Use PostgreSQL when:
  - You need ACID guarantees
  - Data has relationships
  - Query patterns are varied and unpredictable
  - You're building a typical web app
 
Use MongoDB when:
  - Schema evolves rapidly
  - Documents are self-contained
  - You're building document-heavy apps (CMS, catalogs)
 
Use Redis when:
  - You need sub-millisecond latency
  - Data fits in memory
  - Use case: caching, sessions, rate limiting, leaderboards
 
Use Cassandra/DynamoDB when:
  - Write volume is massive (millions/second)
  - Data is time-series or append-only
  - Access patterns are known and fixed

Why This Matters: Wrong database choices are expensive to fix. This post gives you a repeatable process for making the decision instead of guessing.

🔗 Coming Soon: SQL vs NoSQL — Choosing the Right Database →

Study Plans

For Beginners (12 Weeks)

Weeks 1-3: SQL Foundation

Week 1: Install PostgreSQL locally, explore the psql CLI
Week 2: DB-2 (SQL Fundamentals) — practice with a sample dataset
Week 3: DB-3 (Advanced SQL) — window functions, CTEs

Weeks 4-6: NoSQL Landscape

Week 4: DB-4 (NoSQL Overview) — understand the mental model
Week 5: DB-6 (MongoDB) — run a local MongoDB instance, practice CRUD
Week 6: DB-7 (Redis) — run Redis locally, implement a cache and rate limiter

Weeks 7-9: PostgreSQL Deep Dive

Week 7: DB-5 (PostgreSQL Mastery) — JSONB, full-text search, RLS
Week 8: DB-8 (Indexing) — analyze slow queries, add indexes, measure improvement
Week 9: DB-9 (Transactions) — implement transaction patterns in your language

Weeks 10-12: Architecture and Design

Week 10: DB-10 (Schema Design) — redesign a schema you know
Week 11: DB-11 (Migrations) — practice migrations in a test project
Week 12: DB-12 (SQL vs NoSQL) — revisit past decisions with new knowledge

For Backend Developers (6 Weeks — Intermediate)

Week 1: DB-2 + DB-3 (advanced SQL you may have skipped) Week 2: DB-5 (PostgreSQL features you're not using) Week 3: DB-8 (indexing — the highest ROI topic) Week 4: DB-9 (transactions — fix concurrency bugs you might have) Week 5: DB-6 or DB-7 (whichever NoSQL is relevant to your stack) Week 6: DB-12 (decision framework for your next project)

For Interview Prep (3 Weeks — Focused)

Week 1: DB-2 (SQL fundamentals), DB-3 (advanced SQL), DB-8 (indexing basics) Week 2: DB-9 (ACID and transactions), DB-4 (NoSQL overview) Week 3: DB-12 (SQL vs NoSQL discussion), practice design questions

Essential Setup

You can run all examples in this roadmap locally with Docker:

# PostgreSQL
docker run -d \
  --name pg \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  postgres:16
 
# MongoDB
docker run -d \
  --name mongo \
  -p 27017:27017 \
  mongo:7
 
# Redis
docker run -d \
  --name redis \
  -p 6379:6379 \
  redis:7-alpine

Connect to PostgreSQL:

psql -h localhost -U postgres -W

Connect to MongoDB:

mongosh mongodb://localhost:27017

Connect to Redis:

redis-cli -h localhost

Tools You'll Use

PostgreSQL clients:

psql (CLI, built-in)
DBeaver (GUI, free)
pgAdmin (GUI, free)
TablePlus (GUI, paid, excellent UX)

MongoDB clients:

mongosh (CLI, official)
MongoDB Compass (GUI, official, free)
Studio 3T (GUI, paid)

Query analysis:

EXPLAIN ANALYZE (built into PostgreSQL)
pgBadger (log analyzer)
pg_stat_statements (query statistics)

Migration tools (covered in DB-11):

Flyway (Java ecosystem)
Alembic (Python/SQLAlchemy)
golang-migrate (Go)
Prisma Migrate (TypeScript/Node)

How This Roadmap Connects

This series builds on existing posts and prepares you for advanced topics:

Foundation posts (read before or alongside):

Relational Database Fundamentals — database internals, ACID deep dive
Database Schema Design Guide — normalization, ER diagrams

Framework-specific posts (apply this roadmap's knowledge):

What comes next:

Database topics appear throughout the Spring Boot series (DB-level optimizations, JPA)
The Docker & Kubernetes roadmap covers running databases in containers
The Software Architecture series covers patterns built on top of databases (CQRS, event sourcing)

Summary

This roadmap covers what most developers skip: the why behind database decisions, not just the how to write queries.

SQL remains the single most transferable database skill. PostgreSQL runs on everything from a Raspberry Pi to petabyte-scale AWS deployments. MongoDB and Redis solve real problems in specific contexts. Understanding all three — and knowing when to reach for each — makes you a significantly better backend engineer.

Follow the phases in order. Every post builds on the previous one. By the end, database work will feel less like dark magic and more like engineering.

Start here: DB-2: SQL Fundamentals →

SQL & NoSQL Database Roadmap: Master Data Storage

Introduction

Why Databases Are Worth Mastering

What You'll Build

Prerequisites

Learning Path Overview

Phase 1: SQL Fundamentals

DB-2: SQL Fundamentals — SELECT, JOIN, Subqueries

DB-3: Advanced SQL — Window Functions, CTEs, Recursive Queries

Phase 2: NoSQL Fundamentals

DB-4: NoSQL Fundamentals — Document, Key-Value, Column, Graph

DB-5: Deep Dive — PostgreSQL Mastery

DB-6: Deep Dive — MongoDB Complete Guide

Phase 3: Cross-Cutting Concerns

DB-7: Deep Dive — Redis Beyond Caching

DB-8: Deep Dive — Indexing & Query Performance

DB-9: Deep Dive — Database Transactions & Concurrency Control

DB-10: Deep Dive — Schema Design Patterns

DB-11: Deep Dive — Database Migrations & Versioning

DB-12: Deep Dive — SQL vs NoSQL: Choosing the Right Database

Study Plans

For Beginners (12 Weeks)

For Backend Developers (6 Weeks — Intermediate)

For Interview Prep (3 Weeks — Focused)

Essential Setup

Tools You'll Use

How This Roadmap Connects

Summary

📬 Subscribe to Newsletter

💬 Comments