Back to blog

SQL & NoSQL Database Roadmap: Master Data Storage

databasesqlnosqlpostgresqlmongodbredisbackend
SQL & NoSQL Database Roadmap: Master Data Storage

Introduction

Every application needs to store data. The question is never whether you need a database — it's which database, how to query it efficiently, and how to design a schema that won't collapse under real-world load.

Most developers learn SQL by copying queries from Stack Overflow and call it done. They never understand why an index speeds up a query, what isolation levels actually prevent, or when MongoDB genuinely outperforms PostgreSQL. That gap shows up the moment a query slows down in production, data gets corrupted, or a design decision made early becomes impossible to undo.

This roadmap closes that gap. Twelve posts, three phases, covering SQL fundamentals through NoSQL architecture through the cross-cutting concerns that apply everywhere. By the end, you'll make confident database decisions and write queries that scale.

Why Databases Are Worth Mastering

Every backend job requires it
Whether you write Java, Python, Go, or TypeScript, you interact with databases every day. Understanding them deeply makes you faster, more confident, and more valuable.

Performance problems are database problems
In production applications, slow queries are the most common performance bottleneck. An N+1 query problem or a missing index can make a fast application grind to a halt.

Data is the most valuable asset
Code can be rewritten. Data cannot be regenerated. Bad schema decisions, missing constraints, and wrong isolation levels can corrupt data in ways that are painful or impossible to fix.

SQL is universal
SQL has been around since the 1970s and shows no signs of going away. PostgreSQL, MySQL, SQLite, BigQuery, Redshift, Snowflake — all use SQL. It's one of the highest-ROI skills you can develop.

NoSQL is situational, not universal
MongoDB, Redis, Cassandra, DynamoDB each excel in specific scenarios. Knowing when and why to use them — rather than using them by default or avoiding them entirely — is the mark of a mature engineer.

What You'll Build

By the end of this roadmap, you'll be able to:

✅ Write complex SQL queries with JOINs, subqueries, CTEs, and window functions
✅ Design schemas that enforce data integrity and perform well at scale
✅ Understand what PostgreSQL does when you run a query
✅ Model data in MongoDB for document-based access patterns
✅ Use Redis for caching, sessions, rate limiting, and pub/sub
✅ Create indexes that actually improve query performance
✅ Understand transactions, ACID, and how concurrency goes wrong
✅ Choose between SQL and NoSQL based on access patterns, not hype

Prerequisites

This roadmap assumes:

Programming experience in at least one backend language (Python, Java, Go, TypeScript, etc.)
Basic understanding of what databases are (tables, rows, queries — even if vague)
Familiarity with running commands in a terminal
Time commitment — 8-10 hours per week for 10-12 weeks

If you want additional foundation, these posts from this blog complement the roadmap:


Learning Path Overview

This roadmap spans 3 phases across 12 posts:

DB-1 (Roadmap & Overview) ← You are here

  ├─ Phase 1: SQL Fundamentals (Posts 2-3)
  │   ├─ DB-2: SQL Fundamentals (SELECT, JOIN, Subqueries)
  │   └─ DB-3: Advanced SQL (Window Functions, CTEs, Recursive Queries)

  ├─ Phase 2: NoSQL Fundamentals (Posts 4-6)
  │   ├─ DB-4: NoSQL Fundamentals (Document, Key-Value, Column, Graph)
  │   ├─ DB-5: Deep Dive: PostgreSQL Mastery
  │   └─ DB-6: Deep Dive: MongoDB Complete Guide

  └─ Phase 3: Cross-Cutting Concerns (Posts 7-12)
      ├─ DB-7: Deep Dive: Redis Beyond Caching
      ├─ DB-8: Deep Dive: Indexing & Query Performance
      ├─ DB-9: Deep Dive: Transactions & Concurrency Control
      ├─ DB-10: Deep Dive: Schema Design Patterns
      ├─ DB-11: Deep Dive: Database Migrations & Versioning
      └─ DB-12: Deep Dive: SQL vs NoSQL — Choosing the Right Database

Phase 1: SQL Fundamentals

SQL is the language of relational databases. It's been standardized, battle-tested for 50 years, and runs on billions of systems. Learning it well pays dividends for your entire career.

DB-2: SQL Fundamentals — SELECT, JOIN, Subqueries

Goal: Write queries that retrieve exactly the data you need

What You'll Learn:

  • The anatomy of a SELECT statement
  • Filtering with WHERE, ORDER BY, GROUP BY, HAVING
  • INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN — with diagrams
  • Aggregation functions: COUNT, SUM, AVG, MIN, MAX
  • Subqueries in WHERE and FROM clauses
  • The execution order SQL actually uses (not the order you write it)
  • Common query mistakes that silently return wrong results

Practical Examples:

  • E-commerce: find top customers by revenue this month
  • Blog: get all posts with their author names and comment counts
  • Analytics: calculate week-over-week growth rates

Why This Matters: Every database interaction you write uses these foundations. JOINs and aggregations appear in 90% of real-world queries. Getting them right from the start prevents a class of bugs that are hard to spot.

🔗 Coming Soon: SQL Fundamentals — SELECT, JOIN, Subqueries →


DB-3: Advanced SQL — Window Functions, CTEs, Recursive Queries

Goal: Write sophisticated queries that replace complex application-level code

What You'll Learn:

  • Common Table Expressions (CTEs) — readable, reusable query building blocks
  • Window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG()
  • PARTITION BY — running calculations per group
  • Recursive CTEs — traversing hierarchies (categories, org charts, threaded comments)
  • CASE WHEN — conditional logic inside queries
  • Lateral joins — per-row subqueries
  • EXPLAIN ANALYZE — reading query execution plans

Practical Examples:

  • Rank products by sales within each category
  • Calculate 7-day rolling averages for time-series data
  • Find the top N items per group without subqueries
  • Traverse a category tree of arbitrary depth

Why This Matters: Window functions and CTEs replace a whole class of "I'll just do this in application code" decisions. Pulling less data from the database and doing less work in your app is almost always faster.

🔗 Coming Soon: Advanced SQL — Window Functions & CTEs →


Phase 2: NoSQL Fundamentals

NoSQL databases don't replace SQL — they cover access patterns that relational databases handle poorly. Understanding when and why to use each type is the core skill.

DB-4: NoSQL Fundamentals — Document, Key-Value, Column, Graph

Goal: Understand the NoSQL landscape and when each type excels

What You'll Learn:

  • The four main NoSQL categories and what problems each solves
  • Document stores (MongoDB, Firestore) — flexible schemas, nested data
  • Key-value stores (Redis, DynamoDB) — blazing-fast lookups by key
  • Wide-column stores (Cassandra, HBase) — time-series, event data at massive scale
  • Graph databases (Neo4j, Amazon Neptune) — relationship-heavy data
  • BASE vs ACID: what eventual consistency actually means in practice
  • CAP theorem — why you can't have everything

The Trade-off Framework:

TypeBest ForTrade-off
DocumentFlexible schemas, nested dataHarder to query across documents
Key-ValueHigh-speed lookups, sessionsNo complex queries
Wide-ColumnWrite-heavy, time-seriesComplex data modeling
GraphRelationship traversalNiche use cases
RelationalStructured, consistent dataSchema rigidity, joins at scale

Why This Matters: Most developers pick MongoDB because "everyone uses it" or avoid NoSQL entirely out of unfamiliarity. Neither is a strategy. This post gives you the mental framework to make informed choices.

🔗 Coming Soon: NoSQL Fundamentals — Document, Key-Value, Column, Graph →


DB-5: Deep Dive — PostgreSQL Mastery

Goal: Go beyond basic SQL to use PostgreSQL's full power

What You'll Learn:

  • PostgreSQL architecture: how it handles connections, memory, disk I/O
  • Data types you should know: JSONB, UUID, arrays, ranges, ENUM
  • Full-text search without Elasticsearch
  • Partial indexes, expression indexes, covering indexes
  • Row-level security (RLS) for multi-tenant applications
  • Stored procedures and triggers — when they help, when they hurt
  • pg_stat_statements — finding slow queries in production
  • Connection pooling with PgBouncer
  • Logical replication and read replicas

Practical Examples:

  • Store and query semi-structured data with JSONB
  • Implement multi-tenant data isolation with RLS
  • Set up full-text search with ts_vector and ts_query
  • Diagnose a slow query end-to-end

Why This Matters: PostgreSQL is arguably the most capable open-source database in existence. Most teams use 20% of its features. This post unlocks the other 80%.

🔗 Coming Soon: PostgreSQL Mastery →


DB-6: Deep Dive — MongoDB Complete Guide

Goal: Model data effectively in MongoDB and query it efficiently

What You'll Learn:

  • MongoDB data model: documents, collections, embedding vs referencing
  • When to embed and when to reference — the real rules
  • CRUD operations in depth
  • Aggregation pipeline — MongoDB's equivalent of SQL GROUP BY + JOIN
  • Schema design patterns: bucket pattern, outlier pattern, computed pattern
  • Indexing in MongoDB: single, compound, multikey, text, geospatial
  • Transactions in MongoDB (yes, they exist)
  • Change streams — real-time data feeds
  • Atlas Search for full-text search

Embedding vs Referencing:

Embed when:
  - Data is always accessed together
  - Child data is small and bounded
  - One-to-few relationship
 
Reference when:
  - Data is accessed independently
  - Many-to-many relationships
  - Data grows unbounded

Why This Matters: MongoDB's flexibility is a double-edged sword. Good document design leads to fast, simple queries. Bad design leads to massive aggregation pipelines and poor performance.

🔗 Coming Soon: MongoDB Complete Guide →


Phase 3: Cross-Cutting Concerns

These topics apply across all database systems. They're often the difference between a database that works and one that scales.

DB-7: Deep Dive — Redis Beyond Caching

Goal: Use Redis for its full range of capabilities, not just caching

What You'll Learn:

  • Redis data structures: Strings, Lists, Sets, Sorted Sets, Hashes, Streams
  • Caching patterns: cache-aside, write-through, write-behind
  • Session storage and its trade-offs vs. database sessions
  • Rate limiting with Redis and sliding window counters
  • Distributed locking with RedLock
  • Pub/Sub and its limitations
  • Redis Streams for event sourcing
  • Redis persistence: RDB snapshots vs AOF logging
  • Redis Sentinel and Redis Cluster for high availability

Practical Patterns:

Rate Limiting:
  INCR requests:{user_id}:{minute}
  EXPIRE requests:{user_id}:{minute} 60
 
Leaderboard:
  ZADD leaderboard {score} {user_id}
  ZREVRANGE leaderboard 0 9 WITHSCORES

Why This Matters: Redis used purely as a cache wastes its capabilities. Understanding the full data model unlocks patterns that would require separate services if implemented elsewhere.

🔗 Coming Soon: Redis Beyond Caching →


DB-8: Deep Dive — Indexing & Query Performance

Goal: Make slow queries fast and understand why indexes work

What You'll Learn:

  • How B-tree indexes work internally (the actual data structure)
  • When indexes help and when they hurt (write amplification)
  • Composite indexes and the left-prefix rule
  • Covering indexes — queries that never touch the table
  • Index selectivity and cardinality
  • Partial indexes for filtered queries
  • EXPLAIN and EXPLAIN ANALYZE — reading execution plans
  • Common query performance problems: sequential scans, sort spills, hash joins
  • Index bloat and maintenance (REINDEX, VACUUM)

The Index Decision Framework:

Add an index when:
  - Column appears in WHERE, JOIN ON, or ORDER BY
  - Query returns < 5-10% of rows (high selectivity)
  - Table has more reads than writes
 
Don't add an index when:
  - Table is tiny (< a few thousand rows)
  - Column has low cardinality (boolean, status with few values)
  - Table is write-heavy (index maintenance cost exceeds benefit)

Why This Matters: Indexing is the highest-leverage optimization available. A single index on the right column can turn a 10-second query into a 10-millisecond one. Understanding the trade-offs prevents the opposite: over-indexing that slows down writes.

🔗 Coming Soon: Indexing & Query Performance →


DB-9: Deep Dive — Database Transactions & Concurrency Control

Goal: Understand what ACID really means and how concurrency goes wrong

What You'll Learn:

  • ACID properties in depth: atomicity, consistency, isolation, durability
  • What actually happens inside a transaction
  • Concurrency anomalies: dirty reads, non-repeatable reads, phantom reads
  • Isolation levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable
  • Optimistic vs pessimistic locking
  • SELECT FOR UPDATE and SELECT FOR SHARE
  • Deadlocks: what causes them and how to prevent them
  • MVCC (Multi-Version Concurrency Control) — how PostgreSQL avoids locks
  • Long-running transactions and their danger

Isolation Level Matrix:

Isolation LevelDirty ReadNon-Repeatable ReadPhantom Read
Read UncommittedPossiblePossiblePossible
Read CommittedPreventedPossiblePossible
Repeatable ReadPreventedPreventedPossible
SerializablePreventedPreventedPrevented

Why This Matters: Most application bugs are not logic bugs — they're concurrency bugs. Race conditions, double-spend issues, and data corruption usually come from wrong isolation assumptions. This post gives you the mental model to reason about them correctly.

🔗 Coming Soon: Database Transactions & Concurrency Control →


DB-10: Deep Dive — Schema Design Patterns

Goal: Design schemas that age well under changing requirements and growing load

What You'll Learn:

  • Normalization revisited: when to normalize and when to denormalize
  • Polymorphic associations — one table for multiple entity types
  • The entity-attribute-value (EAV) anti-pattern and better alternatives
  • Soft deletes, audit trails, and temporal data patterns
  • JSONB columns as an escape hatch for flexible attributes
  • Schema design for multi-tenancy: shared schema, row-level, schema-per-tenant
  • Time-series data modeling in PostgreSQL
  • Event sourcing schema patterns
  • Naming conventions and their importance

The Normalization Rule of Thumb:

Normalize until it hurts (performance),
then denormalize until it works.

Why This Matters: Schema decisions made early are extremely hard to change later. A table with 500 million rows cannot be easily restructured. Getting the design right — and knowing which trade-offs you're accepting — prevents painful migrations later.

🔗 Coming Soon: Schema Design Patterns →


DB-11: Deep Dive — Database Migrations & Versioning

Goal: Change database schemas safely without downtime or data loss

What You'll Learn:

  • Why migrations fail and how to prevent failures
  • Zero-downtime migration patterns for production systems
  • Adding columns, dropping columns, renaming columns safely
  • Backfilling large tables without locking
  • Migration tools: Flyway, Liquibase, Alembic, Prisma Migrate, golang-migrate
  • Rolling migrations for blue-green deployments
  • Schema version control in Git
  • Testing migrations before production
  • The "expand-contract" pattern for backward-compatible changes

The Expand-Contract Pattern:

Phase 1 - Expand:  Add new_column (nullable)
Deploy application version that writes both old_column and new_column
 
Phase 2 - Migrate: Backfill new_column for existing rows
 
Phase 3 - Contract: Remove old_column after all reads use new_column

Why This Matters: "Just ALTER TABLE in production" works fine on small tables. At scale, schema changes become high-stakes operations. Doing them wrong causes downtime, locks that cascade into outages, or data loss.

🔗 Coming Soon: Database Migrations & Versioning →


DB-12: Deep Dive — SQL vs NoSQL: Choosing the Right Database

Goal: Make confident, justified database decisions for any project

What You'll Learn:

  • A decision framework based on access patterns, not popularity
  • When SQL wins and why (consistency, complex queries, joins)
  • When NoSQL wins and why (scale, flexibility, specific data models)
  • Polyglot persistence — using multiple databases in one system
  • Real-world case studies: what Uber, Instagram, GitHub chose and why
  • Database selection for common application types (e-commerce, SaaS, analytics, IoT)
  • Managed cloud databases: RDS, Cloud SQL, Atlas, Upstash, PlanetScale, Supabase
  • The cost of switching (migration complexity, operational overhead)
  • Questions to ask before choosing a database

The Quick Decision Guide:

Use PostgreSQL when:
  - You need ACID guarantees
  - Data has relationships
  - Query patterns are varied and unpredictable
  - You're building a typical web app
 
Use MongoDB when:
  - Schema evolves rapidly
  - Documents are self-contained
  - You're building document-heavy apps (CMS, catalogs)
 
Use Redis when:
  - You need sub-millisecond latency
  - Data fits in memory
  - Use case: caching, sessions, rate limiting, leaderboards
 
Use Cassandra/DynamoDB when:
  - Write volume is massive (millions/second)
  - Data is time-series or append-only
  - Access patterns are known and fixed

Why This Matters: Wrong database choices are expensive to fix. This post gives you a repeatable process for making the decision instead of guessing.

🔗 Coming Soon: SQL vs NoSQL — Choosing the Right Database →


Study Plans

For Beginners (12 Weeks)

Weeks 1-3: SQL Foundation

  • Week 1: Install PostgreSQL locally, explore the psql CLI
  • Week 2: DB-2 (SQL Fundamentals) — practice with a sample dataset
  • Week 3: DB-3 (Advanced SQL) — window functions, CTEs

Weeks 4-6: NoSQL Landscape

  • Week 4: DB-4 (NoSQL Overview) — understand the mental model
  • Week 5: DB-6 (MongoDB) — run a local MongoDB instance, practice CRUD
  • Week 6: DB-7 (Redis) — run Redis locally, implement a cache and rate limiter

Weeks 7-9: PostgreSQL Deep Dive

  • Week 7: DB-5 (PostgreSQL Mastery) — JSONB, full-text search, RLS
  • Week 8: DB-8 (Indexing) — analyze slow queries, add indexes, measure improvement
  • Week 9: DB-9 (Transactions) — implement transaction patterns in your language

Weeks 10-12: Architecture and Design

  • Week 10: DB-10 (Schema Design) — redesign a schema you know
  • Week 11: DB-11 (Migrations) — practice migrations in a test project
  • Week 12: DB-12 (SQL vs NoSQL) — revisit past decisions with new knowledge

For Backend Developers (6 Weeks — Intermediate)

Week 1: DB-2 + DB-3 (advanced SQL you may have skipped) Week 2: DB-5 (PostgreSQL features you're not using) Week 3: DB-8 (indexing — the highest ROI topic) Week 4: DB-9 (transactions — fix concurrency bugs you might have) Week 5: DB-6 or DB-7 (whichever NoSQL is relevant to your stack) Week 6: DB-12 (decision framework for your next project)


For Interview Prep (3 Weeks — Focused)

Week 1: DB-2 (SQL fundamentals), DB-3 (advanced SQL), DB-8 (indexing basics) Week 2: DB-9 (ACID and transactions), DB-4 (NoSQL overview) Week 3: DB-12 (SQL vs NoSQL discussion), practice design questions


Essential Setup

You can run all examples in this roadmap locally with Docker:

# PostgreSQL
docker run -d \
  --name pg \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  postgres:16
 
# MongoDB
docker run -d \
  --name mongo \
  -p 27017:27017 \
  mongo:7
 
# Redis
docker run -d \
  --name redis \
  -p 6379:6379 \
  redis:7-alpine

Connect to PostgreSQL:

psql -h localhost -U postgres -W

Connect to MongoDB:

mongosh mongodb://localhost:27017

Connect to Redis:

redis-cli -h localhost

Tools You'll Use

PostgreSQL clients:

  • psql (CLI, built-in)
  • DBeaver (GUI, free)
  • pgAdmin (GUI, free)
  • TablePlus (GUI, paid, excellent UX)

MongoDB clients:

  • mongosh (CLI, official)
  • MongoDB Compass (GUI, official, free)
  • Studio 3T (GUI, paid)

Query analysis:

  • EXPLAIN ANALYZE (built into PostgreSQL)
  • pgBadger (log analyzer)
  • pg_stat_statements (query statistics)

Migration tools (covered in DB-11):

  • Flyway (Java ecosystem)
  • Alembic (Python/SQLAlchemy)
  • golang-migrate (Go)
  • Prisma Migrate (TypeScript/Node)

How This Roadmap Connects

This series builds on existing posts and prepares you for advanced topics:

Foundation posts (read before or alongside):

Framework-specific posts (apply this roadmap's knowledge):

What comes next:

  • Database topics appear throughout the Spring Boot series (DB-level optimizations, JPA)
  • The Docker & Kubernetes roadmap covers running databases in containers
  • The Software Architecture series covers patterns built on top of databases (CQRS, event sourcing)

Summary

This roadmap covers what most developers skip: the why behind database decisions, not just the how to write queries.

SQL remains the single most transferable database skill. PostgreSQL runs on everything from a Raspberry Pi to petabyte-scale AWS deployments. MongoDB and Redis solve real problems in specific contexts. Understanding all three — and knowing when to reach for each — makes you a significantly better backend engineer.

Follow the phases in order. Every post builds on the previous one. By the end, database work will feel less like dark magic and more like engineering.

Start here: DB-2: SQL Fundamentals →

📬 Subscribe to Newsletter

Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.

We respect your privacy. Unsubscribe at any time.

💬 Comments

Sign in to leave a comment

We'll never post without your permission.