Relational Database Fundamentals: How Databases Actually Work

Introduction
You write SELECT * FROM users WHERE id = 1 and get a result back. But what actually happens inside the database between your query and the response? How does the database find that one row among millions? Why do some queries take milliseconds while others take minutes?
Most backend developers interact with databases through ORMs and never look deeper. That works — until a query is slow, data is corrupted, or you need to design a schema that won't fall apart at scale.
This guide takes you inside the database engine. You'll understand the relational model, how queries actually execute, what ACID really means, and how to make informed decisions about data types, constraints, and database engines.
What You'll Learn
✅ Understand the relational model — tables, rows, columns, and relationships
✅ See what happens inside the database when you run a SQL query
✅ Master primary keys, foreign keys, and constraints
✅ Choose the right data types for your columns
✅ Understand ACID properties with real-world examples
✅ Learn transactions and isolation levels
✅ Know when to use PostgreSQL vs MySQL vs SQLite
✅ Design a simple e-commerce schema from scratch
The Relational Model
In 1970, Edgar F. Codd published a paper that changed computing forever. He proposed organizing data into relations (tables) with tuples (rows) and attributes (columns). This became the relational model — the foundation of nearly every database you use today.
Tables, Rows, and Columns
A table (or relation) is a structured collection of data about a specific entity:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE,
email VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);| Concept | Also Called | Description |
|---|---|---|
| Table | Relation | A collection of related data (e.g., users, orders) |
| Row | Tuple, Record | A single entry in the table (e.g., one user) |
| Column | Attribute, Field | A property of the entity (e.g., username, email) |
| Schema | — | The structure definition of a table (column names, types, constraints) |
Relationships Between Tables
The "relational" in relational database refers to how tables relate to each other.
One-to-Many (most common):
-- One user has many orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
total DECIMAL(10, 2) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);One user → many orders. The user_id column in orders points back to users.id.
One-to-One:
-- One user has one profile
CREATE TABLE profiles (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL UNIQUE REFERENCES users(id),
bio TEXT,
avatar VARCHAR(500)
);The UNIQUE constraint on user_id ensures each user has at most one profile.
Many-to-Many:
-- Products can have many tags, tags can have many products
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL
);
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE
);
-- Junction table (also called join table or bridge table)
CREATE TABLE product_tags (
product_id INT NOT NULL REFERENCES products(id),
tag_id INT NOT NULL REFERENCES tags(id),
PRIMARY KEY (product_id, tag_id)
);Many-to-many relationships require a junction table that holds pairs of foreign keys.
What Happens When You Run a SQL Query
When you execute a query, the database doesn't just "find" your data. It goes through a multi-stage pipeline. Understanding this pipeline helps you write faster queries and debug performance issues.
The Query Execution Pipeline
SQL Query
│
▼
┌──────────┐
│ Parser │ → Checks syntax, builds parse tree
└──────────┘
│
▼
┌──────────┐
│ Analyzer │ → Validates table/column names, resolves types
└──────────┘
│
▼
┌──────────┐
│ Planner │ → Generates execution plans, picks the cheapest
└──────────┘
│
▼
┌──────────┐
│ Executor │ → Runs the chosen plan, returns results
└──────────┘
│
▼
┌──────────┐
│ Storage │ → Reads/writes data pages on disk
└──────────┘Stage 1: Parser
The parser checks that your SQL is syntactically valid and converts it into a parse tree (an internal representation).
-- This will be parsed successfully
SELECT username FROM users WHERE id = 1;
-- This will fail at the parser stage (syntax error)
SELEC username FORM users WHER id = 1;
-- ERROR: syntax error at or near "SELEC"Stage 2: Analyzer
The analyzer (also called the rewriter) checks that the tables and columns actually exist, resolves data types, and expands wildcards like *.
-- Analyzer resolves `*` to actual columns
SELECT * FROM users WHERE id = 1;
-- Becomes: SELECT id, username, email, created_at FROM users WHERE id = 1;
-- Analyzer catches invalid references
SELECT nonexistent_column FROM users;
-- ERROR: column "nonexistent_column" does not existStage 3: Planner (Query Optimizer)
This is the most interesting stage. The planner generates multiple execution plans and picks the one with the lowest estimated cost.
For example, to find a user by ID:
- Plan A: Sequential scan — read every row and check if
id = 1(slow for large tables) - Plan B: Index scan — use the primary key index to jump directly to the row (fast)
The planner uses statistics about your data (row counts, value distribution, index availability) to estimate costs.
You can see the planner's decision using EXPLAIN:
EXPLAIN SELECT * FROM users WHERE id = 1;
-- Output:
-- Index Scan using users_pkey on users (cost=0.29..8.30 rows=1 width=100)
-- Index Cond: (id = 1)The planner chose an Index Scan because there's a primary key index on id. If there were no index, you'd see a Seq Scan (sequential scan) instead.
Stage 4: Executor
The executor takes the chosen plan and actually runs it. It coordinates reading from storage, applying filters, joining tables, sorting results, and returning data to the client.
Stage 5: Storage Engine
The storage engine manages how data is physically stored on disk. Data is organized in pages (typically 8KB blocks). When the executor needs a row, the storage engine:
- Checks the buffer cache (shared memory) for the page
- If not cached, reads the page from disk into the cache
- Returns the requested data from the cached page
This is why repeated queries are often faster — the data is already in memory.
Primary Keys, Foreign Keys, and Constraints
Constraints are rules enforced by the database to keep your data consistent. They're your first line of defense against bad data.
Primary Key
Every table should have a primary key — a column (or combination of columns) that uniquely identifies each row.
-- Single column primary key (most common)
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL
);
-- Composite primary key (for junction tables)
CREATE TABLE product_tags (
product_id INT NOT NULL,
tag_id INT NOT NULL,
PRIMARY KEY (product_id, tag_id)
);A primary key is automatically NOT NULL and UNIQUE. The database creates an index on it for fast lookups.
Foreign Key
A foreign key creates a link between two tables and enforces referential integrity — you can't create an order for a user that doesn't exist.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
total DECIMAL(10, 2) NOT NULL
);
-- This works (user 1 exists)
INSERT INTO orders (user_id, total) VALUES (1, 99.99);
-- This fails (user 999 doesn't exist)
INSERT INTO orders (user_id, total) VALUES (999, 99.99);
-- ERROR: insert or update on table "orders" violates foreign key constraintCASCADE behavior — what happens when the referenced row is deleted:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
-- When a user is deleted, their orders are also deleted
total DECIMAL(10, 2) NOT NULL
);
-- Other options:
-- ON DELETE RESTRICT → Prevent deletion if orders exist (default)
-- ON DELETE SET NULL → Set user_id to NULL when user is deleted
-- ON DELETE SET DEFAULT → Set to default valueNOT NULL
The simplest but most important constraint. Prevents missing data.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL, -- Required
description TEXT, -- Optional (nullable)
price DECIMAL(10, 2) NOT NULL -- Required
);Rule of thumb: Make columns NOT NULL by default. Only allow NULL when the absence of a value has a specific meaning (e.g., "user hasn't set a bio yet").
UNIQUE
Ensures no two rows have the same value in that column.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE,
email VARCHAR(255) NOT NULL UNIQUE
);
-- First insert works
INSERT INTO users (username, email) VALUES ('john', 'john@example.com');
-- Duplicate username fails
INSERT INTO users (username, email) VALUES ('john', 'other@example.com');
-- ERROR: duplicate key value violates unique constraint "users_username_key"CHECK
Validates that values meet a condition.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
price DECIMAL(10, 2) NOT NULL CHECK (price > 0),
stock INT NOT NULL DEFAULT 0 CHECK (stock >= 0)
);
-- This fails
INSERT INTO products (name, price, stock) VALUES ('Widget', -5.00, 10);
-- ERROR: new row for relation "products" violates check constraint "products_price_check"DEFAULT
Provides a value when none is specified.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- status defaults to 'pending', created_at defaults to current timestamp
INSERT INTO orders DEFAULT VALUES;Choosing the Right Data Types
Data types determine what values a column can hold, how much storage it uses, and what operations are valid. Choosing the wrong type leads to wasted space, incorrect results, or runtime errors.
Numeric Types
| Type | Storage | Range | Use Case |
|---|---|---|---|
SMALLINT | 2 bytes | -32,768 to 32,767 | Status codes, small counters |
INTEGER | 4 bytes | -2.1 billion to 2.1 billion | Most IDs, counts |
BIGINT | 8 bytes | ±9.2 quintillion | Large IDs, analytics counters |
DECIMAL(p,s) | Variable | User-defined precision | Money, exact calculations |
REAL | 4 bytes | ~6 decimal digits | Scientific data (approximate) |
DOUBLE PRECISION | 8 bytes | ~15 decimal digits | Scientific data (approximate) |
Important: Never use REAL or DOUBLE PRECISION for money. Floating-point math produces rounding errors:
-- ❌ BAD: Floating point for money
SELECT 0.1 + 0.2;
-- Returns: 0.30000000000000004
-- ✅ GOOD: DECIMAL for money
SELECT 0.1::DECIMAL(10,2) + 0.2::DECIMAL(10,2);
-- Returns: 0.30String Types
| Type | Description | Use Case |
|---|---|---|
VARCHAR(n) | Variable-length, max n chars | Usernames, emails, short text |
TEXT | Variable-length, unlimited | Blog content, descriptions |
CHAR(n) | Fixed-length, padded with spaces | Fixed codes (e.g., country codes) |
In PostgreSQL, VARCHAR and TEXT have identical performance. The only difference is VARCHAR(n) enforces a length limit. Use VARCHAR(n) when you want to enforce a max length (usernames, emails), and TEXT when there's no meaningful limit.
CREATE TABLE users (
username VARCHAR(50) NOT NULL, -- Enforce 50 char limit
bio TEXT -- No limit needed
);Date and Time Types
| Type | Storage | Description | Example |
|---|---|---|---|
DATE | 4 bytes | Date only | 2026-01-31 |
TIME | 8 bytes | Time only | 14:30:00 |
TIMESTAMP | 8 bytes | Date + time, no timezone | 2026-01-31 14:30:00 |
TIMESTAMPTZ | 8 bytes | Date + time, with timezone | 2026-01-31 14:30:00+07 |
Always use TIMESTAMPTZ (timestamp with time zone). Without timezone info, you'll have bugs when users are in different timezones or when daylight saving time changes.
-- ❌ BAD: No timezone info
CREATE TABLE events (
starts_at TIMESTAMP NOT NULL
);
-- Is 14:00 in UTC? Local time? Server time?
-- ✅ GOOD: Timezone-aware
CREATE TABLE events (
starts_at TIMESTAMPTZ NOT NULL
);
-- Always stored in UTC, converted to user's timezone on displayBoolean
CREATE TABLE users (
id SERIAL PRIMARY KEY,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
is_verified BOOLEAN NOT NULL DEFAULT FALSE
);UUID
-- PostgreSQL has native UUID support
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id INT NOT NULL REFERENCES users(id),
expires_at TIMESTAMPTZ NOT NULL
);JSON/JSONB
-- JSONB is binary JSON — faster to query, slower to insert
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
metadata JSONB DEFAULT '{}'::JSONB
);
-- Query JSONB fields
INSERT INTO products (name, metadata)
VALUES ('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "storage": 512}}');
SELECT name, metadata->>'brand' AS brand
FROM products
WHERE metadata->'specs'->>'ram' = '16';When to use JSONB:
- Flexible/variable attributes that differ per row
- Data from external APIs that you store but rarely query
- Rapid prototyping before you know the final schema
When NOT to use JSONB:
- Data you frequently filter, sort, or join on — use proper columns instead
- Core business data that has a consistent structure
ACID Properties
ACID is a set of four properties that guarantee database transactions are reliable. Every production database you use (PostgreSQL, MySQL, SQLite) provides ACID guarantees.
A — Atomicity
A transaction is all-or-nothing. Either every operation in the transaction succeeds, or none of them do.
-- Transfer $100 from Alice to Bob
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1; -- Alice
UPDATE accounts SET balance = balance + 100 WHERE user_id = 2; -- Bob
COMMIT;
-- Both updates succeed, or neither doesIf the database crashes after debiting Alice but before crediting Bob, the entire transaction is rolled back. Alice keeps her money.
Without atomicity, you'd have to manually detect and fix partial operations — a nightmare with money.
C — Consistency
The database moves from one valid state to another. Constraints (primary keys, foreign keys, CHECK, NOT NULL) are never violated.
-- Constraint: balance cannot go negative
ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);
BEGIN;
UPDATE accounts SET balance = balance - 1000 WHERE user_id = 1;
-- If Alice only has $50, this violates the CHECK constraint
-- The entire transaction is rolled back
COMMIT;
-- ERROR: new row violates check constraint "positive_balance"I — Isolation
Concurrent transactions don't interfere with each other. Each transaction sees a consistent snapshot of the data, even if other transactions are running simultaneously.
-- Transaction A (runs first)
BEGIN;
SELECT balance FROM accounts WHERE user_id = 1; -- Sees $500
-- Transaction B (runs concurrently)
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
COMMIT;
-- Alice now has $400
-- Back in Transaction A
SELECT balance FROM accounts WHERE user_id = 1;
-- What does Transaction A see? Depends on isolation level (see below)D — Durability
Once a transaction is committed, the data survives any failure — power outage, crash, hardware failure.
Databases achieve this using a Write-Ahead Log (WAL):
- Before modifying data pages, write the change to the WAL (sequential, fast)
- Periodically flush data pages to disk (checkpointing)
- On crash recovery, replay uncommitted WAL entries
This is why COMMIT is the point of no return — once the WAL entry is flushed, the data is guaranteed to persist.
Transactions and Isolation Levels
What is a Transaction?
A transaction is a group of operations that execute as a single unit. You start one with BEGIN and finish with COMMIT (apply changes) or ROLLBACK (undo everything).
BEGIN;
INSERT INTO orders (user_id, total) VALUES (1, 150.00);
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 42, 2);
UPDATE products SET stock = stock - 2 WHERE id = 42;
COMMIT;
-- All three operations succeed togetherIf anything fails, you roll back:
BEGIN;
INSERT INTO orders (user_id, total) VALUES (1, 150.00);
-- Oops, product is out of stock
ROLLBACK;
-- The order is undoneIsolation Levels
Isolation levels control what a transaction can see when other transactions are running at the same time. There are four standard levels, from least to most strict:
Read Uncommitted
The loosest level. Transactions can see uncommitted changes from other transactions (dirty reads).
PostgreSQL doesn't actually support this — it automatically upgrades to Read Committed.
Read Committed (PostgreSQL default)
Each statement sees only data committed before that statement began.
-- Transaction A
BEGIN;
SELECT balance FROM accounts WHERE user_id = 1; -- $500
-- Transaction B commits: UPDATE balance to $400
SELECT balance FROM accounts WHERE user_id = 1; -- $400 (sees B's commit)
COMMIT;The two SELECTs in Transaction A can return different values. This is called a non-repeatable read.
Repeatable Read
Each transaction sees only data committed before the transaction began. Once you read a value, it stays the same for the entire transaction.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT balance FROM accounts WHERE user_id = 1; -- $500
-- Transaction B commits: UPDATE balance to $400
SELECT balance FROM accounts WHERE user_id = 1; -- Still $500 (snapshot from BEGIN)
COMMIT;Serializable
The strictest level. Transactions execute as if they ran one at a time (serially). If two transactions conflict, one is aborted and must be retried.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT SUM(balance) FROM accounts; -- $1000
-- Transaction B adds a new account with $200
SELECT SUM(balance) FROM accounts; -- Still $1000
COMMIT;Which Isolation Level Should You Use?
| Level | Dirty Reads | Non-repeatable Reads | Phantom Reads | Performance |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Fastest |
| Read Committed | No | Possible | Possible | Good (default) |
| Repeatable Read | No | No | Possible* | Moderate |
| Serializable | No | No | No | Slowest |
*PostgreSQL's Repeatable Read also prevents phantom reads, unlike the SQL standard minimum.
Recommendation: Use the default Read Committed for most applications. Use Serializable only for critical financial transactions where correctness is more important than throughput.
PostgreSQL vs MySQL vs SQLite
PostgreSQL
Best for: Production web applications, complex queries, data integrity.
Strengths:
- Most standards-compliant SQL database
- Advanced features: JSONB, full-text search, CTEs, window functions, array types
- Excellent concurrency (MVCC — Multi-Version Concurrency Control)
- Strong ecosystem: PostGIS (geospatial), pg_trgm (fuzzy search), TimescaleDB (time-series)
- Battle-tested at massive scale (Instagram, Reddit, Discord)
When to use: Your default choice for any production application.
-- PostgreSQL-specific features
SELECT name, metadata->>'brand' AS brand -- JSONB queries
FROM products
WHERE tags @> ARRAY['electronics'] -- Array contains
AND to_tsvector(description) @@ to_tsquery('fast & reliable'); -- Full-text searchMySQL
Best for: Read-heavy workloads, high-traffic websites, WordPress/PHP ecosystems.
Strengths:
- Very fast for simple read queries
- Excellent replication support
- Huge community and ecosystem
- Default database for many web hosting providers
- Used by Facebook, Twitter, YouTube
Limitations compared to PostgreSQL:
- Weaker support for advanced SQL features (CTEs, window functions added later)
- Less strict data validation by default
- No native JSONB (has JSON but less powerful)
When to use: Simple CRUD applications, read-heavy workloads, when your team already knows MySQL.
SQLite
Best for: Development, testing, embedded applications, single-user tools.
Strengths:
- Zero configuration — it's just a file
- No server process needed
- Extremely fast for read operations
- Perfect for development and testing
- Embedded in every mobile phone (iOS, Android)
Limitations:
- No concurrent writes (uses file-level locking)
- No built-in user management or access control
- Not suitable for multi-server deployments
When to use: Local development, testing, desktop/mobile apps, CLIs, prototyping.
Quick Comparison
| Feature | PostgreSQL | MySQL | SQLite |
|---|---|---|---|
| Type | Server-based | Server-based | Embedded (file) |
| Concurrent writes | Excellent | Good | Poor (single writer) |
| Advanced SQL | Full support | Mostly supported | Basic |
| JSONB | Native, indexable | JSON (less powerful) | JSON functions |
| Full-text search | Built-in | Built-in | Extension (FTS5) |
| Replication | Streaming + logical | Mature replication | Not applicable |
| Best for | General production | Read-heavy web apps | Dev/test/embedded |
Hands-On: Designing an E-Commerce Schema
Let's apply everything we've learned by designing a real schema. We'll build the database for a simple e-commerce application.
Requirements
- Users can register and place orders
- Products belong to categories
- Products can have multiple tags
- Orders contain multiple items with quantities
- Track order status history
The Schema
-- 1. Users
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
username VARCHAR(50) NOT NULL UNIQUE,
password VARCHAR(255) NOT NULL, -- Store hashed passwords only!
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- 2. Categories (hierarchical)
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT REFERENCES categories(id),
UNIQUE (name, parent_id)
);
-- 3. Products
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL CHECK (price > 0),
stock INT NOT NULL DEFAULT 0 CHECK (stock >= 0),
category_id INT REFERENCES categories(id),
is_active BOOLEAN NOT NULL DEFAULT TRUE,
metadata JSONB DEFAULT '{}'::JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- 4. Tags (many-to-many with products)
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE
);
CREATE TABLE product_tags (
product_id INT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
tag_id INT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (product_id, tag_id)
);
-- 5. Orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
total DECIMAL(10, 2) NOT NULL CHECK (total >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- 6. Order Items
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id INT NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price > 0),
UNIQUE (order_id, product_id)
);
-- 7. Order Status History (audit trail)
CREATE TABLE order_status_history (
id SERIAL PRIMARY KEY,
order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
old_status VARCHAR(20),
new_status VARCHAR(20) NOT NULL,
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
changed_by INT REFERENCES users(id)
);Key Design Decisions
1. unit_price in order_items
We store the price at the time of purchase, not a reference to the current product price. If a product's price changes tomorrow, existing orders should still reflect what the customer actually paid.
-- ❌ BAD: Calculate total from current product price
SELECT p.price * oi.quantity AS line_total
FROM order_items oi
JOIN products p ON p.id = oi.product_id;
-- If product price changes, old orders show wrong totals
-- ✅ GOOD: Use stored unit_price
SELECT oi.unit_price * oi.quantity AS line_total
FROM order_items oi;
-- Always shows what the customer actually paid2. Status as CHECK constraint
Using CHECK (status IN (...)) restricts status to valid values at the database level. No application bug can insert status = 'oops'.
3. order_status_history for audit trail
Instead of just storing the current status, we track every status change. This is invaluable for debugging and customer support ("when did this order ship?").
4. Hierarchical categories with parent_id
Categories reference themselves for nesting: Electronics → Laptops → Gaming Laptops.
5. metadata JSONB on products
Different products have different attributes (laptops have RAM, shirts have sizes). JSONB stores these variable attributes without adding dozens of nullable columns.
Sample Queries
Get all products in a category with their tags:
SELECT
p.id,
p.name,
p.price,
c.name AS category,
ARRAY_AGG(t.name) AS tags
FROM products p
LEFT JOIN categories c ON c.id = p.category_id
LEFT JOIN product_tags pt ON pt.product_id = p.id
LEFT JOIN tags t ON t.id = pt.tag_id
WHERE p.is_active = TRUE
GROUP BY p.id, p.name, p.price, c.name
ORDER BY p.created_at DESC;Create an order with items (inside a transaction):
BEGIN;
-- Create the order
INSERT INTO orders (user_id, total)
VALUES (1, 0)
RETURNING id AS order_id;
-- Assume this returns order_id = 42
-- Add items
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
VALUES
(42, 10, 2, 29.99),
(42, 15, 1, 49.99);
-- Update total
UPDATE orders
SET total = (
SELECT SUM(unit_price * quantity)
FROM order_items
WHERE order_id = 42
)
WHERE id = 42;
-- Log status
INSERT INTO order_status_history (order_id, old_status, new_status)
VALUES (42, NULL, 'pending');
-- Decrease stock
UPDATE products SET stock = stock - 2 WHERE id = 10;
UPDATE products SET stock = stock - 1 WHERE id = 15;
COMMIT;This transaction ensures that if any step fails (e.g., insufficient stock), the entire order is rolled back.
Common Mistakes to Avoid
1. Using the Wrong ID Type
-- ❌ Might overflow for high-traffic tables
CREATE TABLE events (
id INTEGER PRIMARY KEY -- Max ~2.1 billion
);
-- ✅ Use BIGINT for tables that could grow large
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY -- Max ~9.2 quintillion
);2. Missing NOT NULL
-- ❌ Everything is nullable by default
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255), -- Can be NULL!
username VARCHAR(50) -- Can be NULL!
);
-- ✅ Be explicit about what's required
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL,
username VARCHAR(50) NOT NULL,
bio TEXT -- Intentionally nullable
);3. Storing Money as Float
-- ❌ BAD
CREATE TABLE products (price REAL);
-- ✅ GOOD
CREATE TABLE products (price DECIMAL(10, 2) NOT NULL CHECK (price > 0));4. No Foreign Keys
-- ❌ BAD: No referential integrity
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT -- Nothing prevents user_id = 999999 (nonexistent user)
);
-- ✅ GOOD: Database enforces valid references
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id)
);5. TIMESTAMP Without Timezone
-- ❌ Ambiguous — what timezone is this?
created_at TIMESTAMP NOT NULL DEFAULT NOW()
-- ✅ Unambiguous — always UTC internally
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()Summary and Key Takeaways
✅ The relational model organizes data into tables with rows and columns, linked by relationships
✅ SQL queries go through 5 stages: parser → analyzer → planner → executor → storage
✅ The query planner is the key to performance — it chooses how to access your data
✅ Constraints (PK, FK, NOT NULL, UNIQUE, CHECK) protect data integrity at the database level
✅ Choose data types carefully: DECIMAL for money, TIMESTAMPTZ for dates, VARCHAR/TEXT for strings
✅ ACID properties guarantee reliable transactions: Atomicity, Consistency, Isolation, Durability
✅ Read Committed is the right default isolation level for most applications
✅ PostgreSQL is the best default choice for production; SQLite for development/testing
✅ Design schemas with constraints — let the database enforce your business rules
✅ Use transactions for multi-step operations that must succeed or fail together
What's Next?
Now that you understand how databases work, continue with the next posts in this series:
📚 Database Schema Design Guide — Learn normalization, denormalization, and design patterns for real-world schemas
📚 SQL Indexing Deep Dive — Make your queries fast with the right indexes
📚 SQL Query Optimization Guide — Write performant queries and avoid common pitfalls
Conclusion
Databases are not black boxes. Understanding how they work internally — from the query execution pipeline to ACID guarantees — makes you a better backend developer. You'll write better schemas, debug performance issues faster, and make informed decisions about data types and constraints.
The relational model has endured for over 50 years because it works. Tables, relationships, constraints, and transactions provide a reliable foundation for applications of any scale. Master these fundamentals, and everything else — ORMs, migrations, optimization — becomes much easier to understand.
Start with PostgreSQL, use constraints liberally, and always wrap multi-step operations in transactions. Your future self will thank you.
📬 Subscribe to Newsletter
Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.
We respect your privacy. Unsubscribe at any time.
💬 Comments
Sign in to leave a comment
We'll never post without your permission.