Relational Database Fundamentals: How Databases Actually Work

Introduction

You write SELECT * FROM users WHERE id = 1 and get a result back. But what actually happens inside the database between your query and the response? How does the database find that one row among millions? Why do some queries take milliseconds while others take minutes?

Most backend developers interact with databases through ORMs and never look deeper. That works — until a query is slow, data is corrupted, or you need to design a schema that won't fall apart at scale.

This guide takes you inside the database engine. You'll understand the relational model, how queries actually execute, what ACID really means, and how to make informed decisions about data types, constraints, and database engines.

What You'll Learn

✅ Understand the relational model — tables, rows, columns, and relationships
✅ See what happens inside the database when you run a SQL query
✅ Master primary keys, foreign keys, and constraints
✅ Choose the right data types for your columns
✅ Understand ACID properties with real-world examples
✅ Learn transactions and isolation levels
✅ Know when to use PostgreSQL vs MySQL vs SQLite
✅ Design a simple e-commerce schema from scratch

The Relational Model

In 1970, Edgar F. Codd published a paper that changed computing forever. He proposed organizing data into relations (tables) with tuples (rows) and attributes (columns). This became the relational model — the foundation of nearly every database you use today.

Tables, Rows, and Columns

A table (or relation) is a structured collection of data about a specific entity:

CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    username    VARCHAR(50) NOT NULL UNIQUE,
    email       VARCHAR(255) NOT NULL UNIQUE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Concept	Also Called	Description
Table	Relation	A collection of related data (e.g., `users`, `orders`)
Row	Tuple, Record	A single entry in the table (e.g., one user)
Column	Attribute, Field	A property of the entity (e.g., `username`, `email`)
Schema	—	The structure definition of a table (column names, types, constraints)

Relationships Between Tables

The "relational" in relational database refers to how tables relate to each other.

One-to-Many (most common):

-- One user has many orders
CREATE TABLE orders (
    id         SERIAL PRIMARY KEY,
    user_id    INT NOT NULL REFERENCES users(id),
    total      DECIMAL(10, 2) NOT NULL,
    status     VARCHAR(20) NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

One user → many orders. The user_id column in orders points back to users.id.

One-to-One:

-- One user has one profile
CREATE TABLE profiles (
    id      SERIAL PRIMARY KEY,
    user_id INT NOT NULL UNIQUE REFERENCES users(id),
    bio     TEXT,
    avatar  VARCHAR(500)
);

The UNIQUE constraint on user_id ensures each user has at most one profile.

Many-to-Many:

-- Products can have many tags, tags can have many products
CREATE TABLE products (
    id   SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL
);
 
CREATE TABLE tags (
    id   SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE
);
 
-- Junction table (also called join table or bridge table)
CREATE TABLE product_tags (
    product_id INT NOT NULL REFERENCES products(id),
    tag_id     INT NOT NULL REFERENCES tags(id),
    PRIMARY KEY (product_id, tag_id)
);

Many-to-many relationships require a junction table that holds pairs of foreign keys.

What Happens When You Run a SQL Query

When you execute a query, the database doesn't just "find" your data. It goes through a multi-stage pipeline. Understanding this pipeline helps you write faster queries and debug performance issues.

The Query Execution Pipeline

SQL Query
   │
   ▼
┌──────────┐
│  Parser  │  → Checks syntax, builds parse tree
└──────────┘
   │
   ▼
┌──────────┐
│ Analyzer │  → Validates table/column names, resolves types
└──────────┘
   │
   ▼
┌──────────┐
│ Planner  │  → Generates execution plans, picks the cheapest
└──────────┘
   │
   ▼
┌──────────┐
│ Executor │  → Runs the chosen plan, returns results
└──────────┘
   │
   ▼
┌──────────┐
│ Storage  │  → Reads/writes data pages on disk
└──────────┘

Stage 1: Parser

The parser checks that your SQL is syntactically valid and converts it into a parse tree (an internal representation).

-- This will be parsed successfully
SELECT username FROM users WHERE id = 1;
 
-- This will fail at the parser stage (syntax error)
SELEC username FORM users WHER id = 1;
-- ERROR: syntax error at or near "SELEC"

Stage 2: Analyzer

The analyzer (also called the rewriter) checks that the tables and columns actually exist, resolves data types, and expands wildcards like *.

-- Analyzer resolves `*` to actual columns
SELECT * FROM users WHERE id = 1;
-- Becomes: SELECT id, username, email, created_at FROM users WHERE id = 1;
 
-- Analyzer catches invalid references
SELECT nonexistent_column FROM users;
-- ERROR: column "nonexistent_column" does not exist

Stage 3: Planner (Query Optimizer)

This is the most interesting stage. The planner generates multiple execution plans and picks the one with the lowest estimated cost.

For example, to find a user by ID:

Plan A: Sequential scan — read every row and check if id = 1 (slow for large tables)
Plan B: Index scan — use the primary key index to jump directly to the row (fast)

The planner uses statistics about your data (row counts, value distribution, index availability) to estimate costs.

You can see the planner's decision using EXPLAIN:

EXPLAIN SELECT * FROM users WHERE id = 1;
 
-- Output:
-- Index Scan using users_pkey on users  (cost=0.29..8.30 rows=1 width=100)
--   Index Cond: (id = 1)

The planner chose an Index Scan because there's a primary key index on id. If there were no index, you'd see a Seq Scan (sequential scan) instead.

Stage 4: Executor

The executor takes the chosen plan and actually runs it. It coordinates reading from storage, applying filters, joining tables, sorting results, and returning data to the client.

Stage 5: Storage Engine

The storage engine manages how data is physically stored on disk. Data is organized in pages (typically 8KB blocks). When the executor needs a row, the storage engine:

Checks the buffer cache (shared memory) for the page
If not cached, reads the page from disk into the cache
Returns the requested data from the cached page

This is why repeated queries are often faster — the data is already in memory.

Primary Keys, Foreign Keys, and Constraints

Constraints are rules enforced by the database to keep your data consistent. They're your first line of defense against bad data.

Primary Key

Every table should have a primary key — a column (or combination of columns) that uniquely identifies each row.

-- Single column primary key (most common)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL
);
 
-- Composite primary key (for junction tables)
CREATE TABLE product_tags (
    product_id INT NOT NULL,
    tag_id     INT NOT NULL,
    PRIMARY KEY (product_id, tag_id)
);

A primary key is automatically NOT NULL and UNIQUE. The database creates an index on it for fast lookups.

Foreign Key

A foreign key creates a link between two tables and enforces referential integrity — you can't create an order for a user that doesn't exist.

CREATE TABLE orders (
    id      SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id),
    total   DECIMAL(10, 2) NOT NULL
);
 
-- This works (user 1 exists)
INSERT INTO orders (user_id, total) VALUES (1, 99.99);
 
-- This fails (user 999 doesn't exist)
INSERT INTO orders (user_id, total) VALUES (999, 99.99);
-- ERROR: insert or update on table "orders" violates foreign key constraint

CASCADE behavior — what happens when the referenced row is deleted:

CREATE TABLE orders (
    id      SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    -- When a user is deleted, their orders are also deleted
    total   DECIMAL(10, 2) NOT NULL
);
 
-- Other options:
-- ON DELETE RESTRICT  → Prevent deletion if orders exist (default)
-- ON DELETE SET NULL  → Set user_id to NULL when user is deleted
-- ON DELETE SET DEFAULT → Set to default value

NOT NULL

The simplest but most important constraint. Prevents missing data.

CREATE TABLE products (
    id    SERIAL PRIMARY KEY,
    name  VARCHAR(200) NOT NULL,  -- Required
    description TEXT,              -- Optional (nullable)
    price DECIMAL(10, 2) NOT NULL -- Required
);

Rule of thumb: Make columns NOT NULL by default. Only allow NULL when the absence of a value has a specific meaning (e.g., "user hasn't set a bio yet").

UNIQUE

Ensures no two rows have the same value in that column.

CREATE TABLE users (
    id       SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL UNIQUE,
    email    VARCHAR(255) NOT NULL UNIQUE
);
 
-- First insert works
INSERT INTO users (username, email) VALUES ('john', 'john@example.com');
 
-- Duplicate username fails
INSERT INTO users (username, email) VALUES ('john', 'other@example.com');
-- ERROR: duplicate key value violates unique constraint "users_username_key"

CHECK

Validates that values meet a condition.

CREATE TABLE products (
    id    SERIAL PRIMARY KEY,
    name  VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL CHECK (price > 0),
    stock INT NOT NULL DEFAULT 0 CHECK (stock >= 0)
);
 
-- This fails
INSERT INTO products (name, price, stock) VALUES ('Widget', -5.00, 10);
-- ERROR: new row for relation "products" violates check constraint "products_price_check"

DEFAULT

Provides a value when none is specified.

CREATE TABLE orders (
    id         SERIAL PRIMARY KEY,
    status     VARCHAR(20) NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- status defaults to 'pending', created_at defaults to current timestamp
INSERT INTO orders DEFAULT VALUES;

Choosing the Right Data Types

Data types determine what values a column can hold, how much storage it uses, and what operations are valid. Choosing the wrong type leads to wasted space, incorrect results, or runtime errors.

Numeric Types

Type	Storage	Range	Use Case
`SMALLINT`	2 bytes	-32,768 to 32,767	Status codes, small counters
`INTEGER`	4 bytes	-2.1 billion to 2.1 billion	Most IDs, counts
`BIGINT`	8 bytes	±9.2 quintillion	Large IDs, analytics counters
`DECIMAL(p,s)`	Variable	User-defined precision	Money, exact calculations
`REAL`	4 bytes	~6 decimal digits	Scientific data (approximate)
`DOUBLE PRECISION`	8 bytes	~15 decimal digits	Scientific data (approximate)

Important: Never use REAL or DOUBLE PRECISION for money. Floating-point math produces rounding errors:

-- ❌ BAD: Floating point for money
SELECT 0.1 + 0.2;
-- Returns: 0.30000000000000004
 
-- ✅ GOOD: DECIMAL for money
SELECT 0.1::DECIMAL(10,2) + 0.2::DECIMAL(10,2);
-- Returns: 0.30

String Types

Type	Description	Use Case
`VARCHAR(n)`	Variable-length, max n chars	Usernames, emails, short text
`TEXT`	Variable-length, unlimited	Blog content, descriptions
`CHAR(n)`	Fixed-length, padded with spaces	Fixed codes (e.g., country codes)

In PostgreSQL, VARCHAR and TEXT have identical performance. The only difference is VARCHAR(n) enforces a length limit. Use VARCHAR(n) when you want to enforce a max length (usernames, emails), and TEXT when there's no meaningful limit.

CREATE TABLE users (
    username VARCHAR(50) NOT NULL,   -- Enforce 50 char limit
    bio      TEXT                     -- No limit needed
);

Date and Time Types

Type	Storage	Description	Example
`DATE`	4 bytes	Date only	`2026-01-31`
`TIME`	8 bytes	Time only	`14:30:00`
`TIMESTAMP`	8 bytes	Date + time, no timezone	`2026-01-31 14:30:00`
`TIMESTAMPTZ`	8 bytes	Date + time, with timezone	`2026-01-31 14:30:00+07`

Always use TIMESTAMPTZ (timestamp with time zone). Without timezone info, you'll have bugs when users are in different timezones or when daylight saving time changes.

-- ❌ BAD: No timezone info
CREATE TABLE events (
    starts_at TIMESTAMP NOT NULL
);
-- Is 14:00 in UTC? Local time? Server time?
 
-- ✅ GOOD: Timezone-aware
CREATE TABLE events (
    starts_at TIMESTAMPTZ NOT NULL
);
-- Always stored in UTC, converted to user's timezone on display

Boolean

CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    is_active   BOOLEAN NOT NULL DEFAULT TRUE,
    is_verified BOOLEAN NOT NULL DEFAULT FALSE
);

UUID

-- PostgreSQL has native UUID support
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
 
CREATE TABLE sessions (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    INT NOT NULL REFERENCES users(id),
    expires_at TIMESTAMPTZ NOT NULL
);

JSON/JSONB

-- JSONB is binary JSON — faster to query, slower to insert
CREATE TABLE products (
    id       SERIAL PRIMARY KEY,
    name     VARCHAR(200) NOT NULL,
    metadata JSONB DEFAULT '{}'::JSONB
);
 
-- Query JSONB fields
INSERT INTO products (name, metadata)
VALUES ('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "storage": 512}}');
 
SELECT name, metadata->>'brand' AS brand
FROM products
WHERE metadata->'specs'->>'ram' = '16';

When to use JSONB:

Flexible/variable attributes that differ per row
Data from external APIs that you store but rarely query
Rapid prototyping before you know the final schema

When NOT to use JSONB:

Data you frequently filter, sort, or join on — use proper columns instead
Core business data that has a consistent structure

ACID Properties

ACID is a set of four properties that guarantee database transactions are reliable. Every production database you use (PostgreSQL, MySQL, SQLite) provides ACID guarantees.

A — Atomicity

A transaction is all-or-nothing. Either every operation in the transaction succeeds, or none of them do.

-- Transfer $100 from Alice to Bob
BEGIN;
 
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1; -- Alice
UPDATE accounts SET balance = balance + 100 WHERE user_id = 2; -- Bob
 
COMMIT;
-- Both updates succeed, or neither does

If the database crashes after debiting Alice but before crediting Bob, the entire transaction is rolled back. Alice keeps her money.

Without atomicity, you'd have to manually detect and fix partial operations — a nightmare with money.

C — Consistency

The database moves from one valid state to another. Constraints (primary keys, foreign keys, CHECK, NOT NULL) are never violated.

-- Constraint: balance cannot go negative
ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);
 
BEGIN;
UPDATE accounts SET balance = balance - 1000 WHERE user_id = 1;
-- If Alice only has $50, this violates the CHECK constraint
-- The entire transaction is rolled back
COMMIT;
-- ERROR: new row violates check constraint "positive_balance"

I — Isolation

Concurrent transactions don't interfere with each other. Each transaction sees a consistent snapshot of the data, even if other transactions are running simultaneously.

-- Transaction A (runs first)
BEGIN;
SELECT balance FROM accounts WHERE user_id = 1; -- Sees $500
 
-- Transaction B (runs concurrently)
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
COMMIT;
-- Alice now has $400
 
-- Back in Transaction A
SELECT balance FROM accounts WHERE user_id = 1;
-- What does Transaction A see? Depends on isolation level (see below)

D — Durability

Once a transaction is committed, the data survives any failure — power outage, crash, hardware failure.

Databases achieve this using a Write-Ahead Log (WAL):

Before modifying data pages, write the change to the WAL (sequential, fast)
Periodically flush data pages to disk (checkpointing)
On crash recovery, replay uncommitted WAL entries

This is why COMMIT is the point of no return — once the WAL entry is flushed, the data is guaranteed to persist.

Transactions and Isolation Levels

What is a Transaction?

A transaction is a group of operations that execute as a single unit. You start one with BEGIN and finish with COMMIT (apply changes) or ROLLBACK (undo everything).

BEGIN;
 
INSERT INTO orders (user_id, total) VALUES (1, 150.00);
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 42, 2);
UPDATE products SET stock = stock - 2 WHERE id = 42;
 
COMMIT;
-- All three operations succeed together

If anything fails, you roll back:

BEGIN;
 
INSERT INTO orders (user_id, total) VALUES (1, 150.00);
-- Oops, product is out of stock
ROLLBACK;
-- The order is undone

Isolation Levels

Isolation levels control what a transaction can see when other transactions are running at the same time. There are four standard levels, from least to most strict:

Read Uncommitted

The loosest level. Transactions can see uncommitted changes from other transactions (dirty reads).

PostgreSQL doesn't actually support this — it automatically upgrades to Read Committed.

Read Committed (PostgreSQL default)

Each statement sees only data committed before that statement began.

-- Transaction A
BEGIN;
SELECT balance FROM accounts WHERE user_id = 1; -- $500
 
    -- Transaction B commits: UPDATE balance to $400
 
SELECT balance FROM accounts WHERE user_id = 1; -- $400 (sees B's commit)
COMMIT;

The two SELECTs in Transaction A can return different values. This is called a non-repeatable read.

Repeatable Read

Each transaction sees only data committed before the transaction began. Once you read a value, it stays the same for the entire transaction.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT balance FROM accounts WHERE user_id = 1; -- $500
 
    -- Transaction B commits: UPDATE balance to $400
 
SELECT balance FROM accounts WHERE user_id = 1; -- Still $500 (snapshot from BEGIN)
COMMIT;

Serializable

The strictest level. Transactions execute as if they ran one at a time (serially). If two transactions conflict, one is aborted and must be retried.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT SUM(balance) FROM accounts; -- $1000
 
    -- Transaction B adds a new account with $200
 
SELECT SUM(balance) FROM accounts; -- Still $1000
COMMIT;

Which Isolation Level Should You Use?

Level	Dirty Reads	Non-repeatable Reads	Phantom Reads	Performance
Read Uncommitted	Possible	Possible	Possible	Fastest
Read Committed	No	Possible	Possible	Good (default)
Repeatable Read	No	No	Possible*	Moderate
Serializable	No	No	No	Slowest

*PostgreSQL's Repeatable Read also prevents phantom reads, unlike the SQL standard minimum.

Recommendation: Use the default Read Committed for most applications. Use Serializable only for critical financial transactions where correctness is more important than throughput.

PostgreSQL vs MySQL vs SQLite

PostgreSQL

Best for: Production web applications, complex queries, data integrity.

Strengths:

Most standards-compliant SQL database
Advanced features: JSONB, full-text search, CTEs, window functions, array types
Excellent concurrency (MVCC — Multi-Version Concurrency Control)
Strong ecosystem: PostGIS (geospatial), pg_trgm (fuzzy search), TimescaleDB (time-series)
Battle-tested at massive scale (Instagram, Reddit, Discord)

When to use: Your default choice for any production application.

-- PostgreSQL-specific features
SELECT name, metadata->>'brand' AS brand  -- JSONB queries
FROM products
WHERE tags @> ARRAY['electronics']         -- Array contains
AND to_tsvector(description) @@ to_tsquery('fast & reliable');  -- Full-text search

MySQL

Best for: Read-heavy workloads, high-traffic websites, WordPress/PHP ecosystems.

Strengths:

Very fast for simple read queries
Excellent replication support
Huge community and ecosystem
Default database for many web hosting providers
Used by Facebook, Twitter, YouTube

Limitations compared to PostgreSQL:

Weaker support for advanced SQL features (CTEs, window functions added later)
Less strict data validation by default
No native JSONB (has JSON but less powerful)

When to use: Simple CRUD applications, read-heavy workloads, when your team already knows MySQL.

SQLite

Best for: Development, testing, embedded applications, single-user tools.

Strengths:

Zero configuration — it's just a file
No server process needed
Extremely fast for read operations
Perfect for development and testing
Embedded in every mobile phone (iOS, Android)

Limitations:

No concurrent writes (uses file-level locking)
No built-in user management or access control
Not suitable for multi-server deployments

When to use: Local development, testing, desktop/mobile apps, CLIs, prototyping.

Quick Comparison

Feature	PostgreSQL	MySQL	SQLite
Type	Server-based	Server-based	Embedded (file)
Concurrent writes	Excellent	Good	Poor (single writer)
Advanced SQL	Full support	Mostly supported	Basic
JSONB	Native, indexable	JSON (less powerful)	JSON functions
Full-text search	Built-in	Built-in	Extension (FTS5)
Replication	Streaming + logical	Mature replication	Not applicable
Best for	General production	Read-heavy web apps	Dev/test/embedded

Hands-On: Designing an E-Commerce Schema

Let's apply everything we've learned by designing a real schema. We'll build the database for a simple e-commerce application.

Requirements

Users can register and place orders
Products belong to categories
Products can have multiple tags
Orders contain multiple items with quantities
Track order status history

The Schema

-- 1. Users
CREATE TABLE users (
    id         SERIAL PRIMARY KEY,
    email      VARCHAR(255) NOT NULL UNIQUE,
    username   VARCHAR(50) NOT NULL UNIQUE,
    password   VARCHAR(255) NOT NULL,  -- Store hashed passwords only!
    is_active  BOOLEAN NOT NULL DEFAULT TRUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- 2. Categories (hierarchical)
CREATE TABLE categories (
    id        SERIAL PRIMARY KEY,
    name      VARCHAR(100) NOT NULL,
    parent_id INT REFERENCES categories(id),
    UNIQUE (name, parent_id)
);
 
-- 3. Products
CREATE TABLE products (
    id          SERIAL PRIMARY KEY,
    name        VARCHAR(200) NOT NULL,
    description TEXT,
    price       DECIMAL(10, 2) NOT NULL CHECK (price > 0),
    stock       INT NOT NULL DEFAULT 0 CHECK (stock >= 0),
    category_id INT REFERENCES categories(id),
    is_active   BOOLEAN NOT NULL DEFAULT TRUE,
    metadata    JSONB DEFAULT '{}'::JSONB,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- 4. Tags (many-to-many with products)
CREATE TABLE tags (
    id   SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE
);
 
CREATE TABLE product_tags (
    product_id INT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    tag_id     INT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (product_id, tag_id)
);
 
-- 5. Orders
CREATE TABLE orders (
    id         SERIAL PRIMARY KEY,
    user_id    INT NOT NULL REFERENCES users(id),
    status     VARCHAR(20) NOT NULL DEFAULT 'pending'
               CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    total      DECIMAL(10, 2) NOT NULL CHECK (total >= 0),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- 6. Order Items
CREATE TABLE order_items (
    id         SERIAL PRIMARY KEY,
    order_id   INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id INT NOT NULL REFERENCES products(id),
    quantity   INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price > 0),
    UNIQUE (order_id, product_id)
);
 
-- 7. Order Status History (audit trail)
CREATE TABLE order_status_history (
    id         SERIAL PRIMARY KEY,
    order_id   INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    old_status VARCHAR(20),
    new_status VARCHAR(20) NOT NULL,
    changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    changed_by INT REFERENCES users(id)
);

Key Design Decisions

1. unit_price in order_items

We store the price at the time of purchase, not a reference to the current product price. If a product's price changes tomorrow, existing orders should still reflect what the customer actually paid.

-- ❌ BAD: Calculate total from current product price
SELECT p.price * oi.quantity AS line_total
FROM order_items oi
JOIN products p ON p.id = oi.product_id;
-- If product price changes, old orders show wrong totals
 
-- ✅ GOOD: Use stored unit_price
SELECT oi.unit_price * oi.quantity AS line_total
FROM order_items oi;
-- Always shows what the customer actually paid

2. Status as CHECK constraint

Using CHECK (status IN (...)) restricts status to valid values at the database level. No application bug can insert status = 'oops'.

3. order_status_history for audit trail

Instead of just storing the current status, we track every status change. This is invaluable for debugging and customer support ("when did this order ship?").

4. Hierarchical categories with parent_id

Categories reference themselves for nesting: Electronics → Laptops → Gaming Laptops.

5. metadata JSONB on products

Different products have different attributes (laptops have RAM, shirts have sizes). JSONB stores these variable attributes without adding dozens of nullable columns.

Sample Queries

Get all products in a category with their tags:

SELECT
    p.id,
    p.name,
    p.price,
    c.name AS category,
    ARRAY_AGG(t.name) AS tags
FROM products p
LEFT JOIN categories c ON c.id = p.category_id
LEFT JOIN product_tags pt ON pt.product_id = p.id
LEFT JOIN tags t ON t.id = pt.tag_id
WHERE p.is_active = TRUE
GROUP BY p.id, p.name, p.price, c.name
ORDER BY p.created_at DESC;

Create an order with items (inside a transaction):

BEGIN;
 
-- Create the order
INSERT INTO orders (user_id, total)
VALUES (1, 0)
RETURNING id AS order_id;
-- Assume this returns order_id = 42
 
-- Add items
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
VALUES
    (42, 10, 2, 29.99),
    (42, 15, 1, 49.99);
 
-- Update total
UPDATE orders
SET total = (
    SELECT SUM(unit_price * quantity)
    FROM order_items
    WHERE order_id = 42
)
WHERE id = 42;
 
-- Log status
INSERT INTO order_status_history (order_id, old_status, new_status)
VALUES (42, NULL, 'pending');
 
-- Decrease stock
UPDATE products SET stock = stock - 2 WHERE id = 10;
UPDATE products SET stock = stock - 1 WHERE id = 15;
 
COMMIT;

This transaction ensures that if any step fails (e.g., insufficient stock), the entire order is rolled back.

Common Mistakes to Avoid

1. Using the Wrong ID Type

-- ❌ Might overflow for high-traffic tables
CREATE TABLE events (
    id INTEGER PRIMARY KEY  -- Max ~2.1 billion
);
 
-- ✅ Use BIGINT for tables that could grow large
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY  -- Max ~9.2 quintillion
);

2. Missing NOT NULL

-- ❌ Everything is nullable by default
CREATE TABLE users (
    id       SERIAL PRIMARY KEY,
    email    VARCHAR(255),  -- Can be NULL!
    username VARCHAR(50)    -- Can be NULL!
);
 
-- ✅ Be explicit about what's required
CREATE TABLE users (
    id       SERIAL PRIMARY KEY,
    email    VARCHAR(255) NOT NULL,
    username VARCHAR(50) NOT NULL,
    bio      TEXT  -- Intentionally nullable
);

3. Storing Money as Float

-- ❌ BAD
CREATE TABLE products (price REAL);
 
-- ✅ GOOD
CREATE TABLE products (price DECIMAL(10, 2) NOT NULL CHECK (price > 0));

4. No Foreign Keys

-- ❌ BAD: No referential integrity
CREATE TABLE orders (
    id      SERIAL PRIMARY KEY,
    user_id INT  -- Nothing prevents user_id = 999999 (nonexistent user)
);
 
-- ✅ GOOD: Database enforces valid references
CREATE TABLE orders (
    id      SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id)
);

5. TIMESTAMP Without Timezone

-- ❌ Ambiguous — what timezone is this?
created_at TIMESTAMP NOT NULL DEFAULT NOW()
 
-- ✅ Unambiguous — always UTC internally
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

Summary and Key Takeaways

✅ The relational model organizes data into tables with rows and columns, linked by relationships
✅ SQL queries go through 5 stages: parser → analyzer → planner → executor → storage
✅ The query planner is the key to performance — it chooses how to access your data
✅ Constraints (PK, FK, NOT NULL, UNIQUE, CHECK) protect data integrity at the database level
✅ Choose data types carefully: DECIMAL for money, TIMESTAMPTZ for dates, VARCHAR/TEXT for strings
✅ ACID properties guarantee reliable transactions: Atomicity, Consistency, Isolation, Durability
✅ Read Committed is the right default isolation level for most applications
✅ PostgreSQL is the best default choice for production; SQLite for development/testing
✅ Design schemas with constraints — let the database enforce your business rules
✅ Use transactions for multi-step operations that must succeed or fail together

What's Next?

Now that you understand how databases work, continue with the next posts in this series:

📚 Database Schema Design Guide — Learn normalization, denormalization, and design patterns for real-world schemas
📚 SQL Indexing Deep Dive — Make your queries fast with the right indexes
📚 SQL Query Optimization Guide — Write performant queries and avoid common pitfalls

Conclusion

Databases are not black boxes. Understanding how they work internally — from the query execution pipeline to ACID guarantees — makes you a better backend developer. You'll write better schemas, debug performance issues faster, and make informed decisions about data types and constraints.

The relational model has endured for over 50 years because it works. Tables, relationships, constraints, and transactions provide a reliable foundation for applications of any scale. Master these fundamentals, and everything else — ORMs, migrations, optimization — becomes much easier to understand.

Start with PostgreSQL, use constraints liberally, and always wrap multi-step operations in transactions. Your future self will thank you.

Introduction

What You'll Learn

The Relational Model

Tables, Rows, and Columns

Relationships Between Tables

What Happens When You Run a SQL Query

The Query Execution Pipeline

Stage 1: Parser

Stage 2: Analyzer

Stage 3: Planner (Query Optimizer)

Stage 4: Executor

Stage 5: Storage Engine

Primary Keys, Foreign Keys, and Constraints

Primary Key

Foreign Key

NOT NULL

UNIQUE

CHECK

DEFAULT

Choosing the Right Data Types

Numeric Types

String Types

Date and Time Types

Boolean

UUID

JSON/JSONB

ACID Properties

A — Atomicity

C — Consistency

I — Isolation

D — Durability

Transactions and Isolation Levels

What is a Transaction?

Isolation Levels

Read Uncommitted

Read Committed (PostgreSQL default)

Repeatable Read

Serializable

Which Isolation Level Should You Use?

PostgreSQL vs MySQL vs SQLite

PostgreSQL

MySQL

SQLite

Quick Comparison

Hands-On: Designing an E-Commerce Schema

Requirements

The Schema

Key Design Decisions

Sample Queries

Common Mistakes to Avoid

1. Using the Wrong ID Type

2. Missing NOT NULL

3. Storing Money as Float

4. No Foreign Keys

5. TIMESTAMP Without Timezone

Summary and Key Takeaways

What's Next?

Conclusion

📬 Subscribe to Newsletter

💬 Comments