Back to blog

Database Schema Design — From Requirements to Tables

databasepostgresqlsqlschema-designbackend
Database Schema Design — From Requirements to Tables

You're building a new feature and need to add three new tables. Should table names be plural or singular? Do you need created_at on every table? Should you use UUIDs or auto-increment IDs? Is it okay to store that JSON blob in a column?

These decisions seem small individually, but they compound quickly. A poorly designed schema leads to slow queries, data anomalies, painful migrations, and features that are hard to build. A well-designed schema makes everything easier — queries are fast, features add cleanly, and bugs are rare.

In this guide, you'll learn a systematic approach to schema design — from gathering business requirements to creating production-ready tables. We'll cover the theory (normalization), the practical decisions (naming, IDs, soft deletes), and apply everything in two real-world case studies.

Prerequisites: This post builds on Relational Database Fundamentals. You should understand tables, primary keys, foreign keys, constraints, and data types before continuing.

What You'll Learn

✅ Gather requirements and identify entities from business rules
✅ Design one-to-one, one-to-many, and many-to-many relationships
✅ Apply normalization (1NF, 2NF, 3NF) with practical examples
✅ Know when to denormalize — and why it's strategic, not lazy
✅ Choose naming conventions that scale across your codebase
✅ Implement soft deletes, audit columns, and timestamps
✅ Decide between UUIDs and auto-increment IDs
✅ Use JSON/JSONB columns strategically
✅ Plan schema evolution and backward compatibility
✅ Design two real-world schemas: blog platform and SaaS multi-tenant


From Requirements to Entities

Before writing any SQL, you need to understand what you're modeling. Schema design starts with business requirements, not with CREATE TABLE.

Extracting Entities and Relationships

Read the requirements and look for nouns (entities) and verbs (relationships):

Business Requirement:
"Users can write blog posts. Each post belongs to one category
but can have multiple tags. Users can comment on posts.
Posts can be published or saved as drafts."

Entities (nouns that have their own identity and attributes):

  • Users
  • Posts
  • Categories
  • Tags
  • Comments

Relationships (how entities connect):

  • User → Post (one user writes many posts)
  • Category → Post (one category contains many posts)
  • Post ↔ Tag (many-to-many)
  • Post → Comment (one post has many comments)
  • User → Comment (one user writes many comments)

Attributes (properties of entities):

  • Post has: title, content, status (draft/published), published date
  • User has: email, username, password
  • Comment has: content, created date

Entity vs Attribute — The Decision

Not every noun deserves its own table. Here's how to decide:

ScenarioEntity or Attribute?Reasoning
User's emailAttributeSimple value, no relationships
User's addressDependsReusable across users → entity. Unique per user → attribute or JSON
Product categoryEntityHas own attributes (name, description, parent), queried independently
Order statusAttribute (enum)Fixed set of values, no additional data
Product colorDependsFixed options → enum attribute. Complex (hex, name, variants) → entity
-- Category as entity (has relationships, queried independently)
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL UNIQUE,
    slug VARCHAR(100) NOT NULL UNIQUE,
    parent_id INT REFERENCES categories(id)
);
 
-- Status as attribute (enum constraint)
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'draft'
        CHECK (status IN ('draft', 'published', 'archived'))
);

Rule of thumb: If it has its own attributes, can be queried independently, or participates in relationships with other entities — make it a table. If it's a simple value that only makes sense in the context of its parent — make it a column.


Relationships Deep Dive

One-to-One

Use one-to-one relationships when you need to:

  • Separate sensitive data (user → user_credentials)
  • Split large tables (product → product_details for rarely-accessed columns)
  • Add optional extensions (user → profile)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    username VARCHAR(50) NOT NULL UNIQUE
);
 
-- Profile extends user (optional one-to-one)
CREATE TABLE profiles (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL UNIQUE REFERENCES users(id) ON DELETE CASCADE,
    bio TEXT,
    avatar_url VARCHAR(500),
    website VARCHAR(255)
);

The UNIQUE constraint on user_id enforces the one-to-one relationship — each user can have at most one profile. ON DELETE CASCADE means deleting a user also deletes their profile.

One-to-Many

The most common relationship type. The "many" side holds the foreign key:

-- One user writes many posts
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id),
    title VARCHAR(200) NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ❌ BAD: Storing multiple IDs in an array
ALTER TABLE users ADD COLUMN post_ids INT[];
 
-- ✅ GOOD: Foreign key on the "many" side
-- (Already shown above — user_id in posts table)

Never store foreign keys as arrays. You lose referential integrity, can't join efficiently, and violate first normal form.

Many-to-Many

Many-to-many relationships require a junction table (also called a join table or bridge table):

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE,
    slug VARCHAR(50) NOT NULL UNIQUE
);
 
-- Junction table: posts ↔ tags
CREATE TABLE post_tags (
    post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    tag_id INT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)
);

The composite primary key (post_id, tag_id) prevents duplicate tag assignments and serves as the primary key.

Junction tables with extra attributes — when the relationship itself has data:

-- Users enroll in courses (many-to-many with enrollment metadata)
CREATE TABLE enrollments (
    user_id INT NOT NULL REFERENCES users(id),
    course_id INT NOT NULL REFERENCES courses(id),
    enrolled_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at TIMESTAMPTZ,
    grade DECIMAL(5, 2) CHECK (grade >= 0 AND grade <= 100),
    PRIMARY KEY (user_id, course_id)
);

The enrolled_at, completed_at, and grade columns belong to the relationship, not to either entity.

Self-Referencing Relationships

An entity can reference itself — useful for hierarchies:

-- Hierarchical categories (parent-child)
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT REFERENCES categories(id) ON DELETE SET NULL
);
 
-- Nested comments (replies)
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    user_id INT NOT NULL REFERENCES users(id),
    parent_id INT REFERENCES comments(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Normalization: 1NF, 2NF, 3NF

Normalization organizes data to reduce redundancy and prevent anomalies (insertion, update, and deletion anomalies). The goal is a schema where each fact is stored exactly once.

First Normal Form (1NF)

Rule: Every column contains atomic (indivisible) values. No repeating groups or arrays.

-- ❌ VIOLATES 1NF: Comma-separated values
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),
    products VARCHAR(500)  -- "Laptop, Mouse, Keyboard"
);
 
-- ❌ VIOLATES 1NF: Repeating columns
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),
    product_1 VARCHAR(100),
    product_2 VARCHAR(100),
    product_3 VARCHAR(100)  -- What if the customer orders 4 items?
);
 
-- ✅ SATISFIES 1NF: Separate table for order items
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL REFERENCES customers(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id INT NOT NULL REFERENCES products(id),
    quantity INT NOT NULL CHECK (quantity > 0)
);
1NF ViolationExampleFix
Comma-separated valuestags: "tech,coding,sql"Junction table post_tags
Repeating columnsphone1, phone2, phone3Separate user_phones table
Array column (queried with joins)author_ids INT[]Junction table post_authors

Second Normal Form (2NF)

Rule: 1NF + every non-key column depends on the entire primary key (not just part of it). This only applies to tables with composite primary keys.

-- ❌ VIOLATES 2NF: product_name depends only on product_id,
-- not on the full key (order_id, product_id)
CREATE TABLE order_items (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    product_name VARCHAR(200),  -- Depends only on product_id (partial dependency)
    quantity INT NOT NULL,
    PRIMARY KEY (order_id, product_id)
);
 
-- Problem: If product name changes, update ALL order_items rows
-- Problem: Product name stored redundantly in every order
-- ✅ SATISFIES 2NF: Product data in its own table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);
 
CREATE TABLE order_items (
    order_id INT NOT NULL REFERENCES orders(id),
    product_id INT NOT NULL REFERENCES products(id),
    quantity INT NOT NULL,
    unit_price DECIMAL(10, 2) NOT NULL,  -- Price snapshot at purchase time
    PRIMARY KEY (order_id, product_id)
);

Notice unit_price is intentionally duplicated — it captures the price at purchase time, which is different from the current product price. This is strategic denormalization, not a 2NF violation.

Third Normal Form (3NF)

Rule: 2NF + no transitive dependencies. Every non-key column depends directly on the primary key, not on another non-key column.

-- ❌ VIOLATES 3NF: country depends on zip_code, not on customer id
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    zip_code VARCHAR(20),
    city VARCHAR(100),
    country VARCHAR(100)  -- Determined by zip_code, not by customer
);
 
-- Problem: Same city/country stored for every customer in that zip code
-- Problem: If zip_code → country mapping changes, update many rows
-- ✅ SATISFIES 3NF: Extract location data
CREATE TABLE zip_codes (
    zip_code VARCHAR(20) PRIMARY KEY,
    city VARCHAR(100) NOT NULL,
    state VARCHAR(100),
    country VARCHAR(100) NOT NULL
);
 
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    zip_code VARCHAR(20) REFERENCES zip_codes(zip_code)
);

Another common example:

-- ❌ VIOLATES 3NF: category_name depends on category_id
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    category_id INT NOT NULL,
    category_name VARCHAR(100)  -- Transitive: id → category_id → category_name
);
 
-- ✅ SATISFIES 3NF
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL UNIQUE
);
 
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    category_id INT NOT NULL REFERENCES categories(id)
);

When Is 3NF Enough?

For most applications, 3NF is the right target. Higher normal forms (BCNF, 4NF, 5NF) exist but are rarely needed in practice. Start at 3NF, then denormalize specific tables when you have measured performance problems.


When to Denormalize

Denormalization adds redundancy intentionally to improve read performance. It's not laziness — it's a deliberate trade-off.

Golden rule: Normalize by default. Denormalize only when you measure a performance problem that normalization causes.

Pattern 1: Cache Aggregates

-- Instead of counting posts on every page load:
-- SELECT COUNT(*) FROM posts WHERE user_id = 42;
 
-- Cache the count in the users table:
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    post_count INT NOT NULL DEFAULT 0  -- Denormalized
);
 
-- Keep it in sync with a trigger:
CREATE OR REPLACE FUNCTION update_user_post_count()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        UPDATE users SET post_count = post_count + 1
        WHERE id = NEW.user_id;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE users SET post_count = post_count - 1
        WHERE id = OLD.user_id;
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_post_count
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_user_post_count();

Pattern 2: Snapshot Historical Data

-- Order items store product info at purchase time
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INT NOT NULL REFERENCES orders(id),
    product_id INT NOT NULL REFERENCES products(id),
    product_name VARCHAR(200) NOT NULL,   -- Snapshot
    unit_price DECIMAL(10, 2) NOT NULL,   -- Snapshot
    quantity INT NOT NULL
);
-- Product name and price may change later,
-- but the order should reflect what was actually purchased

Pattern 3: Materialized Views

-- Pre-compute expensive reporting queries
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
    DATE_TRUNC('month', created_at) AS month,
    SUM(total) AS revenue,
    COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at);
 
-- Refresh periodically
REFRESH MATERIALIZED VIEW monthly_revenue;

When to Denormalize — Decision Table

ScenarioDenormalize?Why
Frequently changing source data❌ NoHard to keep copies in sync
Write-heavy tables❌ NoUpdate overhead is too high
Small tables (< 10k rows)❌ NoJoins are already fast
Financial/audit data❌ NoNeed single source of truth
Read-heavy dashboards✅ YesTolerate slight staleness
E-commerce order history✅ YesNeed point-in-time snapshots
User profile display (post count, follower count)✅ YesAvoids COUNT(*) on every page load

Naming Conventions

Consistent naming is one of the highest-leverage decisions you can make. Pick a convention and enforce it everywhere.

Tables — Plural, snake_case

-- ✅ RECOMMENDED
users
blog_posts
order_items
product_categories
 
-- ❌ AVOID
User          -- Singular, PascalCase
blogPost      -- camelCase
tbl_users     -- Hungarian notation prefix

Plural reads naturally: SELECT * FROM users means "select from the collection of users."

Columns — snake_case, Descriptive

-- ✅ GOOD
user_id
created_at
first_name
is_active
 
-- ❌ BAD
userId        -- camelCase
fName         -- Abbreviation
active        -- Ambiguous (boolean? string? status?)

Foreign Keys — {referenced_table_singular}_id

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id),       -- references users
    category_id INT REFERENCES categories(id)         -- references categories
);

Boolean Columns — Prefix with is_ or has_

is_active BOOLEAN NOT NULL DEFAULT TRUE
is_verified BOOLEAN NOT NULL DEFAULT FALSE
has_paid BOOLEAN NOT NULL DEFAULT FALSE

Timestamps — Suffix with _at

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
deleted_at TIMESTAMPTZ          -- Soft delete
published_at TIMESTAMPTZ        -- When post was published

Convention Summary

ElementConventionExample
Table namesPlural, snake_caseusers, order_items
Column namessnake_casefirst_name, created_at
Primary keysidid BIGSERIAL PRIMARY KEY
Foreign keys{table_singular}_iduser_id, category_id
Booleansis_ or has_ prefixis_active, has_paid
Timestamps_at suffixcreated_at, deleted_at
Indexesidx_{table}_{columns}idx_posts_user_id
Constraintschk_{table}_{rule}chk_orders_status

Soft Deletes vs Hard Deletes

Hard Delete — Permanent Removal

DELETE FROM users WHERE id = 1;
-- Gone forever. No undo.

Use when: GDPR "right to be forgotten," temporary data (sessions, caches), or when no audit trail is needed.

Soft Delete — Mark as Deleted

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    deleted_at TIMESTAMPTZ  -- NULL = active, timestamp = soft-deleted
);
 
-- "Delete" a user
UPDATE users SET deleted_at = NOW() WHERE id = 1;
 
-- Query active users only
SELECT * FROM users WHERE deleted_at IS NULL;
 
-- Restore a user
UPDATE users SET deleted_at = NULL WHERE id = 1;

Use when: Audit trails, undo/restore functionality, compliance (financial records), or retention analytics.

Soft Delete with Metadata

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    deleted_at TIMESTAMPTZ,
    deleted_by INT REFERENCES users(id)
);

Important: If you use soft deletes, remember to add WHERE deleted_at IS NULL to every query. Missing this filter is a common source of bugs. Consider creating a view:

CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

Audit Columns

Add tracking columns to every table that stores business data:

CREATE TABLE posts (
    id BIGSERIAL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    content TEXT NOT NULL,
 
    -- Audit columns
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by INT NOT NULL REFERENCES users(id),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_by INT REFERENCES users(id)
);

Auto-update updated_at with a trigger:

CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_posts_updated_at
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION set_updated_at();

This trigger is reusable — apply it to any table with an updated_at column.

Which Audit Columns Are Essential?

ColumnEssential?When to Include
created_at✅ AlwaysEvery table — you always want to know when records were created
updated_at✅ AlwaysEvery mutable table — track last modification
created_byRecommendedMulti-user systems — who created this record?
updated_byOptionalSensitive data — who made the last change?
deleted_atOptionalSoft delete tables only
deleted_byOptionalWhen you need accountability for deletions

UUIDs vs Auto-Increment IDs

Auto-Increment (SERIAL / BIGSERIAL)

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL
);
-- IDs: 1, 2, 3, 4, ...

Pros: Small (8 bytes), fast sequential inserts, human-readable, excellent index performance.

Cons: Predictable (enumeration attacks on public APIs), sequential gaps reveal deletion patterns, hard to merge databases (ID collisions).

UUIDs

CREATE EXTENSION IF NOT EXISTS "pgcrypto";
 
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) NOT NULL
);
-- IDs: a3bb189e-8bf9-4c25-b5d3-... (128-bit random)

Pros: Globally unique, unpredictable, can be generated client-side (offline-first apps), safe to merge databases.

Cons: Large (16 bytes), random inserts cause B-tree page splits, harder to debug, larger indexes.

Decision Table

Use CaseAuto-IncrementUUID
Internal tables (never exposed in API)
Public-facing API resource IDs
Distributed systems / microservices
High-write tables (millions of inserts/day)
Offline-first mobile apps
Simple admin panels

Use BIGSERIAL internally for fast joins and UUID publicly for API responses:

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,                                      -- Internal
    public_id UUID NOT NULL UNIQUE DEFAULT gen_random_uuid(),       -- Exposed in API
    email VARCHAR(255) NOT NULL UNIQUE
);
 
-- API returns public_id: "a3bb189e-..."
-- Database joins use id: 42 (fast integer comparison)

JSON/JSONB Columns

PostgreSQL's JSONB type is powerful but often misused. Use it for genuinely semi-structured data, not as a way to avoid schema design.

When to Use JSONB

CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    -- Variable attributes differ by product type
    attributes JSONB DEFAULT '{}'::JSONB
);
 
-- Laptop: {"ram_gb": 16, "storage_gb": 512, "cpu": "Intel i7"}
-- Shirt:  {"size": "L", "color": "blue", "material": "cotton"}
-- Query JSONB fields
SELECT name, attributes->>'color' AS color
FROM products
WHERE attributes->>'color' = 'blue';
 
-- Index JSONB for performance
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

Good use cases: Variable product attributes, user preferences/settings, storing external API responses, feature flags.

When NOT to Use JSONB

-- ❌ BAD: Core business data in JSONB
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    data JSONB  -- {"customer_id": 1, "total": 99.99, "items": [...]}
);
-- Problems: No constraints, no foreign keys, no type safety,
-- can't join efficiently, no referential integrity
-- ✅ GOOD: Proper relational schema
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(id),
    total DECIMAL(10, 2) NOT NULL CHECK (total >= 0),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Data TypeRelational ColumnsJSONB
Frequently queried/filtered
Has foreign key relationships
Fixed, known schema
Variable/dynamic schema
Rarely queried metadata

Schema Evolution and Backward Compatibility

Schemas change over time. Plan for it from the start.

Safe Migrations

-- ✅ SAFE: Add nullable column (no data rewrite)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
 
-- ✅ SAFE: Add column with default (PostgreSQL 11+ — no rewrite for static defaults)
ALTER TABLE users ADD COLUMN is_verified BOOLEAN NOT NULL DEFAULT FALSE;
 
-- ❌ DANGEROUS: Add NOT NULL without default (fails on existing rows)
ALTER TABLE users ADD COLUMN required_field VARCHAR(100) NOT NULL;
-- ERROR: column "required_field" of relation "users" contains null values

Multi-Step Migration Pattern

When you need a NOT NULL column on an existing table:

-- Step 1: Add as nullable
ALTER TABLE users ADD COLUMN email_verified BOOLEAN;
 
-- Step 2: Backfill existing rows
UPDATE users SET email_verified = FALSE WHERE email_verified IS NULL;
 
-- Step 3: Set NOT NULL
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
 
-- Step 4: Add default for future inserts
ALTER TABLE users ALTER COLUMN email_verified SET DEFAULT FALSE;

Renaming Columns Safely

Never rename in one step in production. Use a multi-step approach:

-- Step 1: Add new column
ALTER TABLE users ADD COLUMN username VARCHAR(50);
 
-- Step 2: Copy data
UPDATE users SET username = user_name;
 
-- Step 3: Deploy application that reads from BOTH columns
-- SELECT COALESCE(username, user_name) AS username FROM users;
 
-- Step 4: Stop writing to old column (new deploy)
 
-- Step 5: Drop old column (after verifying everything works)
ALTER TABLE users DROP COLUMN user_name;

We'll cover migrations in depth in a future post on Database Migrations in Production.


Case Study 1: Blog Platform

Let's apply everything we've learned to design a blog platform schema.

Requirements:

  • Users register and write posts
  • Posts belong to one category (hierarchical categories)
  • Posts can have multiple tags
  • Users can comment on posts (with nested replies)
  • Posts can be drafted or published
  • SEO metadata (slugs, meta descriptions)
  • Full audit trail
-- =============================================
-- Blog Platform Schema
-- =============================================
 
-- Users
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    username VARCHAR(50) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    bio TEXT,
    avatar_url VARCHAR(500),
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    is_verified BOOLEAN NOT NULL DEFAULT FALSE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Categories (hierarchical via parent_id)
CREATE TABLE categories (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    slug VARCHAR(100) NOT NULL UNIQUE,
    description TEXT,
    parent_id BIGINT REFERENCES categories(id) ON DELETE SET NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Tags
CREATE TABLE tags (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE,
    slug VARCHAR(50) NOT NULL UNIQUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Posts
CREATE TABLE posts (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    category_id BIGINT REFERENCES categories(id) ON DELETE SET NULL,
    title VARCHAR(200) NOT NULL,
    slug VARCHAR(200) NOT NULL UNIQUE,
    excerpt TEXT,
    content TEXT NOT NULL,
    cover_image_url VARCHAR(500),
 
    -- SEO
    meta_description VARCHAR(160),
 
    -- Status
    status VARCHAR(20) NOT NULL DEFAULT 'draft'
        CHECK (status IN ('draft', 'published', 'archived')),
    published_at TIMESTAMPTZ,
 
    -- Denormalized counters
    view_count INT NOT NULL DEFAULT 0,
    comment_count INT NOT NULL DEFAULT 0,
 
    -- Audit
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
 
    -- Business rule: published posts must have a published_at date
    CHECK (status != 'published' OR published_at IS NOT NULL)
);
 
-- Post ↔ Tags (many-to-many)
CREATE TABLE post_tags (
    post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    tag_id BIGINT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)
);
 
-- Comments (nested via parent_id)
CREATE TABLE comments (
    id BIGSERIAL PRIMARY KEY,
    post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    user_id BIGINT NOT NULL REFERENCES users(id),
    parent_id BIGINT REFERENCES comments(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    is_approved BOOLEAN NOT NULL DEFAULT FALSE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Indexes
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_category_id ON posts(category_id);
CREATE INDEX idx_posts_status ON posts(status);
CREATE INDEX idx_posts_published_at ON posts(published_at DESC);
CREATE INDEX idx_posts_slug ON posts(slug);
CREATE INDEX idx_comments_post_id ON comments(post_id);
CREATE INDEX idx_comments_parent_id ON comments(parent_id);

Design Decisions

  • Hierarchical categories via parent_id self-reference — supports "Programming > Python > FastAPI"
  • Denormalized counters (view_count, comment_count) — avoids expensive COUNT queries on every page view
  • CHECK constraint on status — published posts must have published_at
  • Slugs are UNIQUE — enables SEO-friendly URLs (/blog/my-first-post)
  • Comments nest via parent_id — supports threaded discussions

Example Queries

-- Published posts with author, category, and tags
SELECT
    p.title, p.slug, p.excerpt, p.published_at,
    u.username AS author,
    c.name AS category,
    ARRAY_AGG(DISTINCT t.name) AS tags
FROM posts p
JOIN users u ON u.id = p.user_id
LEFT JOIN categories c ON c.id = p.category_id
LEFT JOIN post_tags pt ON pt.post_id = p.id
LEFT JOIN tags t ON t.id = pt.tag_id
WHERE p.status = 'published'
GROUP BY p.id, u.username, c.name
ORDER BY p.published_at DESC;
-- Nested comments using recursive CTE
WITH RECURSIVE comment_tree AS (
    SELECT id, user_id, parent_id, content, created_at, 0 AS depth
    FROM comments
    WHERE post_id = 1 AND parent_id IS NULL
 
    UNION ALL
 
    SELECT c.id, c.user_id, c.parent_id, c.content, c.created_at, ct.depth + 1
    FROM comments c
    JOIN comment_tree ct ON c.parent_id = ct.id
)
SELECT * FROM comment_tree ORDER BY depth, created_at;

Case Study 2: SaaS Multi-Tenant

Now let's design a project management SaaS where multiple organizations share the same database.

Requirements:

  • Multiple organizations (tenants) with subscription plans
  • Users can belong to multiple organizations with different roles
  • Projects within organizations
  • Tasks within projects with assignees and priorities
  • Tenant isolation — users must only see their organization's data
-- =============================================
-- SaaS Multi-Tenant Schema
-- =============================================
 
-- Organizations (tenants)
CREATE TABLE organizations (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    slug VARCHAR(200) NOT NULL UNIQUE,
    plan VARCHAR(20) NOT NULL DEFAULT 'free'
        CHECK (plan IN ('free', 'pro', 'enterprise')),
    max_users INT NOT NULL DEFAULT 5,
    max_projects INT NOT NULL DEFAULT 3,
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Users (global — can belong to multiple orgs)
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Organization memberships (many-to-many with roles)
CREATE TABLE organization_members (
    organization_id BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    role VARCHAR(20) NOT NULL DEFAULT 'member'
        CHECK (role IN ('owner', 'admin', 'member', 'viewer')),
    joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (organization_id, user_id)
);
 
-- Projects (belong to one organization)
CREATE TABLE projects (
    id BIGSERIAL PRIMARY KEY,
    organization_id BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    name VARCHAR(200) NOT NULL,
    description TEXT,
    status VARCHAR(20) NOT NULL DEFAULT 'active'
        CHECK (status IN ('active', 'archived', 'completed')),
    created_by BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (organization_id, name)  -- Same name allowed across different orgs
);
 
-- Tasks (belong to a project)
CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    project_id BIGINT NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    organization_id BIGINT NOT NULL REFERENCES organizations(id),
    title VARCHAR(200) NOT NULL,
    description TEXT,
    status VARCHAR(20) NOT NULL DEFAULT 'todo'
        CHECK (status IN ('todo', 'in_progress', 'review', 'done', 'cancelled')),
    priority VARCHAR(20) NOT NULL DEFAULT 'medium'
        CHECK (priority IN ('low', 'medium', 'high', 'urgent')),
    assigned_to BIGINT REFERENCES users(id),
    due_date DATE,
    created_by BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
 
-- Indexes for multi-tenant queries
CREATE INDEX idx_org_members_user ON organization_members(user_id);
CREATE INDEX idx_projects_org ON projects(organization_id);
CREATE INDEX idx_tasks_project ON tasks(project_id);
CREATE INDEX idx_tasks_org ON tasks(organization_id);
CREATE INDEX idx_tasks_assigned ON tasks(assigned_to);
CREATE INDEX idx_tasks_status ON tasks(status);

Row-Level Security (Tenant Isolation)

PostgreSQL can enforce tenant isolation at the database level, so application bugs can't leak data across organizations:

-- Enable RLS on tasks table
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
 
-- Users can only see tasks in their organizations
CREATE POLICY tasks_tenant_isolation ON tasks
    USING (
        organization_id IN (
            SELECT organization_id
            FROM organization_members
            WHERE user_id = current_setting('app.current_user_id')::BIGINT
        )
    );
 
-- In your application, set the user context before queries:
-- SET app.current_user_id = '42';
-- Now SELECT * FROM tasks only returns tasks from user 42's organizations

Design Decisions

  1. organization_id denormalized in tasks — Tasks could derive their org from project.organization_id, but denormalizing enables efficient RLS filtering without a join. This is intentional.

  2. Composite UNIQUE on projectsUNIQUE (organization_id, name) allows two orgs to have a project named "Website Redesign."

  3. Users are global — A user can belong to multiple organizations. The organization_members junction table tracks role per org.

  4. Row-Level Security — PostgreSQL enforces isolation at the database level. Even if your application has a bug, cross-tenant data leaks are prevented.

Example Queries

-- All projects a user can access (across all their orgs)
SELECT p.*, o.name AS org_name
FROM projects p
JOIN organizations o ON o.id = p.organization_id
JOIN organization_members om ON om.organization_id = p.organization_id
WHERE om.user_id = 42;
 
-- Tasks assigned to a user across all orgs
SELECT t.*, p.name AS project_name
FROM tasks t
JOIN projects p ON p.id = t.project_id
JOIN organization_members om ON om.organization_id = t.organization_id
WHERE om.user_id = 42 AND t.assigned_to = 42
ORDER BY t.due_date NULLS LAST;
 
-- Check if user is admin in a specific org
SELECT EXISTS (
    SELECT 1 FROM organization_members
    WHERE user_id = 42
      AND organization_id = 5
      AND role IN ('owner', 'admin')
) AS is_admin;

Summary and Key Takeaways

Start with requirements — Extract entities (nouns) and relationships (verbs) from business rules before writing SQL
Normalize to 3NF by default — Each fact stored once, no partial or transitive dependencies
Denormalize strategically — Cache aggregates, snapshot historical data, use materialized views for reporting
Use consistent naming — Plural tables, snake_case columns, {table}_id for foreign keys, is_ for booleans
Add audit columns everywherecreated_at and updated_at on every table; deleted_at for soft deletes
Choose IDs wisely — BIGSERIAL for internal PKs, UUID for public-facing API identifiers
JSONB for variable data only — Product attributes, user preferences; never for core business data with relationships
Plan for evolution — Add nullable columns first, backfill, then add constraints; rename columns in multiple steps
Multi-tenant needs isolation — Denormalize organization_id for RLS, use PostgreSQL Row-Level Security
Index every foreign key — Foreign keys without indexes cause slow joins and slow cascading deletes

Common Anti-Patterns

Anti-PatternProblemFix
Comma-separated IDsViolates 1NF, can't joinJunction table
God table (100+ columns)Slow queries, hard to maintainSplit into related tables
No constraintsSilent data corruptionAdd FK, CHECK, NOT NULL
JSONB for everythingNo type safety, no joinsProper relational columns
No audit columnsCan't debug production issuesAdd created_at, updated_at
No indexes on foreign keysSlow joins and cascadesCREATE INDEX on every FK

What's Next?

Continue building your database skills with these related posts:

📚 Relational Database Fundamentals — Review the relational model, ACID properties, data types, and constraints
📚 SQL Indexing Deep Dive (coming soon) — B-tree indexes, composite indexes, covering indexes, EXPLAIN ANALYZE, and when indexes hurt
📚 SQL Query Optimization Guide (coming soon) — N+1 problems, JOIN strategies, pagination, window functions, and profiling tools


Conclusion

Schema design is both art and science. The technical rules — normalization, constraints, data types — are the science. Knowing when to break those rules — denormalization, JSONB columns, composite keys — is the art.

Start normalized. Denormalize only when you measure a real performance problem. Use consistent naming conventions from day one — renaming tables in production is painful. Always add audit columns — you'll thank yourself when debugging issues at 2 AM.

The two case studies show patterns you'll encounter repeatedly. The blog platform handles hierarchical data, many-to-many relationships, and nested comments. The SaaS schema demonstrates tenant isolation, role-based access, and Row-Level Security.

Your schema is the foundation of your application. Get it right, and everything else becomes easier.

📬 Subscribe to Newsletter

Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.

We respect your privacy. Unsubscribe at any time.

💬 Comments

Sign in to leave a comment

We'll never post without your permission.