SQL & NoSQL Database Roadmap: Master Data Storage

Introduction
Every application needs to store data. The question is never whether you need a database — it's which database, how to query it efficiently, and how to design a schema that won't collapse under real-world load.
Most developers learn SQL by copying queries from Stack Overflow and call it done. They never understand why an index speeds up a query, what isolation levels actually prevent, or when MongoDB genuinely outperforms PostgreSQL. That gap shows up the moment a query slows down in production, data gets corrupted, or a design decision made early becomes impossible to undo.
This roadmap closes that gap. Twelve posts, three phases, covering SQL fundamentals through NoSQL architecture through the cross-cutting concerns that apply everywhere. By the end, you'll make confident database decisions and write queries that scale.
Why Databases Are Worth Mastering
Every backend job requires it
Whether you write Java, Python, Go, or TypeScript, you interact with databases every day. Understanding them deeply makes you faster, more confident, and more valuable.
Performance problems are database problems
In production applications, slow queries are the most common performance bottleneck. An N+1 query problem or a missing index can make a fast application grind to a halt.
Data is the most valuable asset
Code can be rewritten. Data cannot be regenerated. Bad schema decisions, missing constraints, and wrong isolation levels can corrupt data in ways that are painful or impossible to fix.
SQL is universal
SQL has been around since the 1970s and shows no signs of going away. PostgreSQL, MySQL, SQLite, BigQuery, Redshift, Snowflake — all use SQL. It's one of the highest-ROI skills you can develop.
NoSQL is situational, not universal
MongoDB, Redis, Cassandra, DynamoDB each excel in specific scenarios. Knowing when and why to use them — rather than using them by default or avoiding them entirely — is the mark of a mature engineer.
What You'll Build
By the end of this roadmap, you'll be able to:
✅ Write complex SQL queries with JOINs, subqueries, CTEs, and window functions
✅ Design schemas that enforce data integrity and perform well at scale
✅ Understand what PostgreSQL does when you run a query
✅ Model data in MongoDB for document-based access patterns
✅ Use Redis for caching, sessions, rate limiting, and pub/sub
✅ Create indexes that actually improve query performance
✅ Understand transactions, ACID, and how concurrency goes wrong
✅ Choose between SQL and NoSQL based on access patterns, not hype
Prerequisites
This roadmap assumes:
✅ Programming experience in at least one backend language (Python, Java, Go, TypeScript, etc.)
✅ Basic understanding of what databases are (tables, rows, queries — even if vague)
✅ Familiarity with running commands in a terminal
✅ Time commitment — 8-10 hours per week for 10-12 weeks
If you want additional foundation, these posts from this blog complement the roadmap:
- Relational Database Fundamentals — how databases work internally
- Database Schema Design Guide — normalization, ERDs, design patterns
Learning Path Overview
This roadmap spans 3 phases across 12 posts:
DB-1 (Roadmap & Overview) ← You are here
│
├─ Phase 1: SQL Fundamentals (Posts 2-3)
│ ├─ DB-2: SQL Fundamentals (SELECT, JOIN, Subqueries)
│ └─ DB-3: Advanced SQL (Window Functions, CTEs, Recursive Queries)
│
├─ Phase 2: NoSQL Fundamentals (Posts 4-6)
│ ├─ DB-4: NoSQL Fundamentals (Document, Key-Value, Column, Graph)
│ ├─ DB-5: Deep Dive: PostgreSQL Mastery
│ └─ DB-6: Deep Dive: MongoDB Complete Guide
│
└─ Phase 3: Cross-Cutting Concerns (Posts 7-12)
├─ DB-7: Deep Dive: Redis Beyond Caching
├─ DB-8: Deep Dive: Indexing & Query Performance
├─ DB-9: Deep Dive: Transactions & Concurrency Control
├─ DB-10: Deep Dive: Schema Design Patterns
├─ DB-11: Deep Dive: Database Migrations & Versioning
└─ DB-12: Deep Dive: SQL vs NoSQL — Choosing the Right DatabasePhase 1: SQL Fundamentals
SQL is the language of relational databases. It's been standardized, battle-tested for 50 years, and runs on billions of systems. Learning it well pays dividends for your entire career.
DB-2: SQL Fundamentals — SELECT, JOIN, Subqueries
Goal: Write queries that retrieve exactly the data you need
What You'll Learn:
- The anatomy of a SELECT statement
- Filtering with WHERE, ORDER BY, GROUP BY, HAVING
- INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN — with diagrams
- Aggregation functions: COUNT, SUM, AVG, MIN, MAX
- Subqueries in WHERE and FROM clauses
- The execution order SQL actually uses (not the order you write it)
- Common query mistakes that silently return wrong results
Practical Examples:
- E-commerce: find top customers by revenue this month
- Blog: get all posts with their author names and comment counts
- Analytics: calculate week-over-week growth rates
Why This Matters: Every database interaction you write uses these foundations. JOINs and aggregations appear in 90% of real-world queries. Getting them right from the start prevents a class of bugs that are hard to spot.
🔗 Coming Soon: SQL Fundamentals — SELECT, JOIN, Subqueries →
DB-3: Advanced SQL — Window Functions, CTEs, Recursive Queries
Goal: Write sophisticated queries that replace complex application-level code
What You'll Learn:
- Common Table Expressions (CTEs) — readable, reusable query building blocks
- Window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG()
- PARTITION BY — running calculations per group
- Recursive CTEs — traversing hierarchies (categories, org charts, threaded comments)
- CASE WHEN — conditional logic inside queries
- Lateral joins — per-row subqueries
- EXPLAIN ANALYZE — reading query execution plans
Practical Examples:
- Rank products by sales within each category
- Calculate 7-day rolling averages for time-series data
- Find the top N items per group without subqueries
- Traverse a category tree of arbitrary depth
Why This Matters: Window functions and CTEs replace a whole class of "I'll just do this in application code" decisions. Pulling less data from the database and doing less work in your app is almost always faster.
🔗 Coming Soon: Advanced SQL — Window Functions & CTEs →
Phase 2: NoSQL Fundamentals
NoSQL databases don't replace SQL — they cover access patterns that relational databases handle poorly. Understanding when and why to use each type is the core skill.
DB-4: NoSQL Fundamentals — Document, Key-Value, Column, Graph
Goal: Understand the NoSQL landscape and when each type excels
What You'll Learn:
- The four main NoSQL categories and what problems each solves
- Document stores (MongoDB, Firestore) — flexible schemas, nested data
- Key-value stores (Redis, DynamoDB) — blazing-fast lookups by key
- Wide-column stores (Cassandra, HBase) — time-series, event data at massive scale
- Graph databases (Neo4j, Amazon Neptune) — relationship-heavy data
- BASE vs ACID: what eventual consistency actually means in practice
- CAP theorem — why you can't have everything
The Trade-off Framework:
| Type | Best For | Trade-off |
|---|---|---|
| Document | Flexible schemas, nested data | Harder to query across documents |
| Key-Value | High-speed lookups, sessions | No complex queries |
| Wide-Column | Write-heavy, time-series | Complex data modeling |
| Graph | Relationship traversal | Niche use cases |
| Relational | Structured, consistent data | Schema rigidity, joins at scale |
Why This Matters: Most developers pick MongoDB because "everyone uses it" or avoid NoSQL entirely out of unfamiliarity. Neither is a strategy. This post gives you the mental framework to make informed choices.
🔗 Coming Soon: NoSQL Fundamentals — Document, Key-Value, Column, Graph →
DB-5: Deep Dive — PostgreSQL Mastery
Goal: Go beyond basic SQL to use PostgreSQL's full power
What You'll Learn:
- PostgreSQL architecture: how it handles connections, memory, disk I/O
- Data types you should know: JSONB, UUID, arrays, ranges, ENUM
- Full-text search without Elasticsearch
- Partial indexes, expression indexes, covering indexes
- Row-level security (RLS) for multi-tenant applications
- Stored procedures and triggers — when they help, when they hurt
- pg_stat_statements — finding slow queries in production
- Connection pooling with PgBouncer
- Logical replication and read replicas
Practical Examples:
- Store and query semi-structured data with JSONB
- Implement multi-tenant data isolation with RLS
- Set up full-text search with ts_vector and ts_query
- Diagnose a slow query end-to-end
Why This Matters: PostgreSQL is arguably the most capable open-source database in existence. Most teams use 20% of its features. This post unlocks the other 80%.
🔗 Coming Soon: PostgreSQL Mastery →
DB-6: Deep Dive — MongoDB Complete Guide
Goal: Model data effectively in MongoDB and query it efficiently
What You'll Learn:
- MongoDB data model: documents, collections, embedding vs referencing
- When to embed and when to reference — the real rules
- CRUD operations in depth
- Aggregation pipeline — MongoDB's equivalent of SQL GROUP BY + JOIN
- Schema design patterns: bucket pattern, outlier pattern, computed pattern
- Indexing in MongoDB: single, compound, multikey, text, geospatial
- Transactions in MongoDB (yes, they exist)
- Change streams — real-time data feeds
- Atlas Search for full-text search
Embedding vs Referencing:
Embed when:
- Data is always accessed together
- Child data is small and bounded
- One-to-few relationship
Reference when:
- Data is accessed independently
- Many-to-many relationships
- Data grows unboundedWhy This Matters: MongoDB's flexibility is a double-edged sword. Good document design leads to fast, simple queries. Bad design leads to massive aggregation pipelines and poor performance.
🔗 Coming Soon: MongoDB Complete Guide →
Phase 3: Cross-Cutting Concerns
These topics apply across all database systems. They're often the difference between a database that works and one that scales.
DB-7: Deep Dive — Redis Beyond Caching
Goal: Use Redis for its full range of capabilities, not just caching
What You'll Learn:
- Redis data structures: Strings, Lists, Sets, Sorted Sets, Hashes, Streams
- Caching patterns: cache-aside, write-through, write-behind
- Session storage and its trade-offs vs. database sessions
- Rate limiting with Redis and sliding window counters
- Distributed locking with RedLock
- Pub/Sub and its limitations
- Redis Streams for event sourcing
- Redis persistence: RDB snapshots vs AOF logging
- Redis Sentinel and Redis Cluster for high availability
Practical Patterns:
Rate Limiting:
INCR requests:{user_id}:{minute}
EXPIRE requests:{user_id}:{minute} 60
Leaderboard:
ZADD leaderboard {score} {user_id}
ZREVRANGE leaderboard 0 9 WITHSCORESWhy This Matters: Redis used purely as a cache wastes its capabilities. Understanding the full data model unlocks patterns that would require separate services if implemented elsewhere.
🔗 Coming Soon: Redis Beyond Caching →
DB-8: Deep Dive — Indexing & Query Performance
Goal: Make slow queries fast and understand why indexes work
What You'll Learn:
- How B-tree indexes work internally (the actual data structure)
- When indexes help and when they hurt (write amplification)
- Composite indexes and the left-prefix rule
- Covering indexes — queries that never touch the table
- Index selectivity and cardinality
- Partial indexes for filtered queries
- EXPLAIN and EXPLAIN ANALYZE — reading execution plans
- Common query performance problems: sequential scans, sort spills, hash joins
- Index bloat and maintenance (REINDEX, VACUUM)
The Index Decision Framework:
Add an index when:
- Column appears in WHERE, JOIN ON, or ORDER BY
- Query returns < 5-10% of rows (high selectivity)
- Table has more reads than writes
Don't add an index when:
- Table is tiny (< a few thousand rows)
- Column has low cardinality (boolean, status with few values)
- Table is write-heavy (index maintenance cost exceeds benefit)Why This Matters: Indexing is the highest-leverage optimization available. A single index on the right column can turn a 10-second query into a 10-millisecond one. Understanding the trade-offs prevents the opposite: over-indexing that slows down writes.
🔗 Coming Soon: Indexing & Query Performance →
DB-9: Deep Dive — Database Transactions & Concurrency Control
Goal: Understand what ACID really means and how concurrency goes wrong
What You'll Learn:
- ACID properties in depth: atomicity, consistency, isolation, durability
- What actually happens inside a transaction
- Concurrency anomalies: dirty reads, non-repeatable reads, phantom reads
- Isolation levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable
- Optimistic vs pessimistic locking
- SELECT FOR UPDATE and SELECT FOR SHARE
- Deadlocks: what causes them and how to prevent them
- MVCC (Multi-Version Concurrency Control) — how PostgreSQL avoids locks
- Long-running transactions and their danger
Isolation Level Matrix:
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed | Prevented | Possible | Possible |
| Repeatable Read | Prevented | Prevented | Possible |
| Serializable | Prevented | Prevented | Prevented |
Why This Matters: Most application bugs are not logic bugs — they're concurrency bugs. Race conditions, double-spend issues, and data corruption usually come from wrong isolation assumptions. This post gives you the mental model to reason about them correctly.
🔗 Coming Soon: Database Transactions & Concurrency Control →
DB-10: Deep Dive — Schema Design Patterns
Goal: Design schemas that age well under changing requirements and growing load
What You'll Learn:
- Normalization revisited: when to normalize and when to denormalize
- Polymorphic associations — one table for multiple entity types
- The entity-attribute-value (EAV) anti-pattern and better alternatives
- Soft deletes, audit trails, and temporal data patterns
- JSONB columns as an escape hatch for flexible attributes
- Schema design for multi-tenancy: shared schema, row-level, schema-per-tenant
- Time-series data modeling in PostgreSQL
- Event sourcing schema patterns
- Naming conventions and their importance
The Normalization Rule of Thumb:
Normalize until it hurts (performance),
then denormalize until it works.Why This Matters: Schema decisions made early are extremely hard to change later. A table with 500 million rows cannot be easily restructured. Getting the design right — and knowing which trade-offs you're accepting — prevents painful migrations later.
🔗 Coming Soon: Schema Design Patterns →
DB-11: Deep Dive — Database Migrations & Versioning
Goal: Change database schemas safely without downtime or data loss
What You'll Learn:
- Why migrations fail and how to prevent failures
- Zero-downtime migration patterns for production systems
- Adding columns, dropping columns, renaming columns safely
- Backfilling large tables without locking
- Migration tools: Flyway, Liquibase, Alembic, Prisma Migrate, golang-migrate
- Rolling migrations for blue-green deployments
- Schema version control in Git
- Testing migrations before production
- The "expand-contract" pattern for backward-compatible changes
The Expand-Contract Pattern:
Phase 1 - Expand: Add new_column (nullable)
Deploy application version that writes both old_column and new_column
Phase 2 - Migrate: Backfill new_column for existing rows
Phase 3 - Contract: Remove old_column after all reads use new_columnWhy This Matters: "Just ALTER TABLE in production" works fine on small tables. At scale, schema changes become high-stakes operations. Doing them wrong causes downtime, locks that cascade into outages, or data loss.
🔗 Coming Soon: Database Migrations & Versioning →
DB-12: Deep Dive — SQL vs NoSQL: Choosing the Right Database
Goal: Make confident, justified database decisions for any project
What You'll Learn:
- A decision framework based on access patterns, not popularity
- When SQL wins and why (consistency, complex queries, joins)
- When NoSQL wins and why (scale, flexibility, specific data models)
- Polyglot persistence — using multiple databases in one system
- Real-world case studies: what Uber, Instagram, GitHub chose and why
- Database selection for common application types (e-commerce, SaaS, analytics, IoT)
- Managed cloud databases: RDS, Cloud SQL, Atlas, Upstash, PlanetScale, Supabase
- The cost of switching (migration complexity, operational overhead)
- Questions to ask before choosing a database
The Quick Decision Guide:
Use PostgreSQL when:
- You need ACID guarantees
- Data has relationships
- Query patterns are varied and unpredictable
- You're building a typical web app
Use MongoDB when:
- Schema evolves rapidly
- Documents are self-contained
- You're building document-heavy apps (CMS, catalogs)
Use Redis when:
- You need sub-millisecond latency
- Data fits in memory
- Use case: caching, sessions, rate limiting, leaderboards
Use Cassandra/DynamoDB when:
- Write volume is massive (millions/second)
- Data is time-series or append-only
- Access patterns are known and fixedWhy This Matters: Wrong database choices are expensive to fix. This post gives you a repeatable process for making the decision instead of guessing.
🔗 Coming Soon: SQL vs NoSQL — Choosing the Right Database →
Study Plans
For Beginners (12 Weeks)
Weeks 1-3: SQL Foundation
- Week 1: Install PostgreSQL locally, explore the psql CLI
- Week 2: DB-2 (SQL Fundamentals) — practice with a sample dataset
- Week 3: DB-3 (Advanced SQL) — window functions, CTEs
Weeks 4-6: NoSQL Landscape
- Week 4: DB-4 (NoSQL Overview) — understand the mental model
- Week 5: DB-6 (MongoDB) — run a local MongoDB instance, practice CRUD
- Week 6: DB-7 (Redis) — run Redis locally, implement a cache and rate limiter
Weeks 7-9: PostgreSQL Deep Dive
- Week 7: DB-5 (PostgreSQL Mastery) — JSONB, full-text search, RLS
- Week 8: DB-8 (Indexing) — analyze slow queries, add indexes, measure improvement
- Week 9: DB-9 (Transactions) — implement transaction patterns in your language
Weeks 10-12: Architecture and Design
- Week 10: DB-10 (Schema Design) — redesign a schema you know
- Week 11: DB-11 (Migrations) — practice migrations in a test project
- Week 12: DB-12 (SQL vs NoSQL) — revisit past decisions with new knowledge
For Backend Developers (6 Weeks — Intermediate)
Week 1: DB-2 + DB-3 (advanced SQL you may have skipped) Week 2: DB-5 (PostgreSQL features you're not using) Week 3: DB-8 (indexing — the highest ROI topic) Week 4: DB-9 (transactions — fix concurrency bugs you might have) Week 5: DB-6 or DB-7 (whichever NoSQL is relevant to your stack) Week 6: DB-12 (decision framework for your next project)
For Interview Prep (3 Weeks — Focused)
Week 1: DB-2 (SQL fundamentals), DB-3 (advanced SQL), DB-8 (indexing basics) Week 2: DB-9 (ACID and transactions), DB-4 (NoSQL overview) Week 3: DB-12 (SQL vs NoSQL discussion), practice design questions
Essential Setup
You can run all examples in this roadmap locally with Docker:
# PostgreSQL
docker run -d \
--name pg \
-e POSTGRES_PASSWORD=password \
-p 5432:5432 \
postgres:16
# MongoDB
docker run -d \
--name mongo \
-p 27017:27017 \
mongo:7
# Redis
docker run -d \
--name redis \
-p 6379:6379 \
redis:7-alpineConnect to PostgreSQL:
psql -h localhost -U postgres -WConnect to MongoDB:
mongosh mongodb://localhost:27017Connect to Redis:
redis-cli -h localhostTools You'll Use
PostgreSQL clients:
- psql (CLI, built-in)
- DBeaver (GUI, free)
- pgAdmin (GUI, free)
- TablePlus (GUI, paid, excellent UX)
MongoDB clients:
- mongosh (CLI, official)
- MongoDB Compass (GUI, official, free)
- Studio 3T (GUI, paid)
Query analysis:
- EXPLAIN ANALYZE (built into PostgreSQL)
- pgBadger (log analyzer)
- pg_stat_statements (query statistics)
Migration tools (covered in DB-11):
- Flyway (Java ecosystem)
- Alembic (Python/SQLAlchemy)
- golang-migrate (Go)
- Prisma Migrate (TypeScript/Node)
How This Roadmap Connects
This series builds on existing posts and prepares you for advanced topics:
Foundation posts (read before or alongside):
- Relational Database Fundamentals — database internals, ACID deep dive
- Database Schema Design Guide — normalization, ER diagrams
Framework-specific posts (apply this roadmap's knowledge):
- Working with Databases in FastAPI
- Spring Boot Database Integration
- Spring Boot Caching with Redis
- Spring Boot Advanced JPA Optimization
What comes next:
- Database topics appear throughout the Spring Boot series (DB-level optimizations, JPA)
- The Docker & Kubernetes roadmap covers running databases in containers
- The Software Architecture series covers patterns built on top of databases (CQRS, event sourcing)
Summary
This roadmap covers what most developers skip: the why behind database decisions, not just the how to write queries.
SQL remains the single most transferable database skill. PostgreSQL runs on everything from a Raspberry Pi to petabyte-scale AWS deployments. MongoDB and Redis solve real problems in specific contexts. Understanding all three — and knowing when to reach for each — makes you a significantly better backend engineer.
Follow the phases in order. Every post builds on the previous one. By the end, database work will feel less like dark magic and more like engineering.
Start here: DB-2: SQL Fundamentals →
📬 Subscribe to Newsletter
Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.
We respect your privacy. Unsubscribe at any time.
💬 Comments
Sign in to leave a comment
We'll never post without your permission.