Advanced SQL: Window Functions, CTEs & Recursive Queries

Introduction

In DB-2: SQL Fundamentals, you learned SELECT, JOINs, GROUP BY, and subqueries. Those tools handle most day-to-day queries. But the moment someone asks "rank customers by spend within each region" or "show a running total of revenue over time" or "traverse a category tree" — basic SQL hits a wall.

That's where advanced SQL comes in. Window functions let you compute rankings, running totals, and moving averages without collapsing rows into groups. CTEs make complex queries readable and composable. Recursive CTEs traverse hierarchical data — org charts, category trees, comment threads — entirely in SQL.

These aren't niche features. They're what separates a developer who can write SQL from one who can think in SQL. And they show up constantly in interviews, analytics dashboards, and production reporting.

Prerequisites: This post is part of the SQL & NoSQL Database Roadmap. Complete DB-2: SQL Fundamentals first.

What You'll Learn

✅ Write CTEs to break complex queries into readable steps
✅ Use window functions for rankings, running totals, and moving averages
✅ Understand PARTITION BY, ORDER BY, and frame specifications
✅ Traverse hierarchical data with recursive CTEs
✅ Apply CASE, COALESCE, and conditional logic in queries
✅ Use string, date, and math functions for real-world transformations
✅ Build analytics dashboard queries from scratch

The Sample Schema (Extended)

We'll reuse the e-commerce schema from DB-2 and add a few tables for analytics and hierarchy examples:

-- Reuse from DB-2: customers, products, orders, order_items
 
-- Add an employees table for hierarchy examples
CREATE TABLE employees (
    id         SERIAL PRIMARY KEY,
    name       VARCHAR(100) NOT NULL,
    title      VARCHAR(100) NOT NULL,
    department VARCHAR(50)  NOT NULL,
    salary     NUMERIC(10, 2) NOT NULL,
    manager_id INT REFERENCES employees(id),
    hired_at   DATE NOT NULL
);
 
INSERT INTO employees (name, title, department, salary, manager_id, hired_at) VALUES
    ('Sarah Chen',    'CEO',              'Executive',   250000, NULL, '2018-01-15'),
    ('Mike Ross',     'VP Engineering',   'Engineering', 200000, 1,    '2018-06-01'),
    ('Lisa Park',     'VP Sales',         'Sales',       190000, 1,    '2019-03-10'),
    ('Tom Wilson',    'Senior Engineer',  'Engineering', 150000, 2,    '2020-01-20'),
    ('Anna Kim',      'Senior Engineer',  'Engineering', 145000, 2,    '2020-04-15'),
    ('James Liu',     'Engineer',         'Engineering', 120000, 4,    '2021-07-01'),
    ('Emma Davis',    'Engineer',         'Engineering', 115000, 4,    '2021-09-15'),
    ('Chris Lee',     'Sales Manager',    'Sales',       130000, 3,    '2020-02-01'),
    ('Nina Patel',    'Sales Rep',        'Sales',       85000,  8,    '2022-01-10'),
    ('Oscar Brown',   'Sales Rep',        'Sales',       82000,  8,    '2022-06-15');
 
-- Add a categories table for recursive hierarchy
CREATE TABLE categories (
    id        SERIAL PRIMARY KEY,
    name      VARCHAR(100) NOT NULL,
    parent_id INT REFERENCES categories(id)
);
 
INSERT INTO categories (name, parent_id) VALUES
    ('All Products',  NULL),
    ('Electronics',   1),
    ('Computers',     2),
    ('Laptops',       3),
    ('Desktops',      3),
    ('Phones',        2),
    ('Furniture',     1),
    ('Desks',         7),
    ('Chairs',        7);

We'll also add more orders with dates spread across months for time-series analytics:

-- Additional orders for time-series analysis (add to existing orders)
INSERT INTO orders (customer_id, total, status, created_at) VALUES
    (1, 299.00,  'completed', '2025-01-15'),
    (2, 549.00,  'completed', '2025-01-22'),
    (1, 1299.00, 'completed', '2025-02-03'),
    (3, 199.00,  'completed', '2025-02-18'),
    (2, 899.00,  'completed', '2025-03-05'),
    (4, 459.00,  'pending',   '2025-03-12'),
    (1, 749.00,  'completed', '2025-03-20'),
    (3, 1599.00, 'completed', '2025-04-01'),
    (2, 329.00,  'completed', '2025-04-15'),
    (4, 899.00,  'completed', '2025-04-22'),
    (1, 199.00,  'completed', '2025-05-10'),
    (3, 649.00,  'completed', '2025-05-18');

Common Table Expressions (CTEs)

A CTE is a named temporary result set that exists only for the duration of a single query. Think of it as a disposable view — you define it with WITH, then reference it by name.

Basic CTE Syntax

WITH cte_name AS (
    SELECT ...
    FROM ...
)
SELECT *
FROM cte_name;

Here's a real example. Without a CTE, finding customers who spend more than average requires a subquery:

-- Subquery version (hard to read as complexity grows)
SELECT c.name, customer_totals.total_spent
FROM customers c
JOIN (
    SELECT customer_id, SUM(total) AS total_spent
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id
) AS customer_totals ON customer_totals.customer_id = c.id
WHERE customer_totals.total_spent > (
    SELECT AVG(total_spent)
    FROM (
        SELECT SUM(total) AS total_spent
        FROM orders
        WHERE status = 'completed'
        GROUP BY customer_id
    ) AS avg_calc
);

With a CTE, the same logic is clearer:

WITH customer_spending AS (
    SELECT customer_id, SUM(total) AS total_spent
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id
)
SELECT c.name, cs.total_spent
FROM customers c
JOIN customer_spending cs ON cs.customer_id = c.id
WHERE cs.total_spent > (SELECT AVG(total_spent) FROM customer_spending);

The CTE is defined once and referenced twice — no repeated logic, no nested subqueries.

Multiple CTEs

Chain multiple CTEs separated by commas. Each can reference the ones defined before it:

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', created_at)
),
revenue_with_growth AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
        revenue - LAG(revenue) OVER (ORDER BY month) AS growth
    FROM monthly_revenue
)
SELECT
    month,
    revenue,
    prev_month_revenue,
    growth,
    CASE
        WHEN prev_month_revenue > 0
        THEN ROUND(growth / prev_month_revenue * 100, 1)
        ELSE NULL
    END AS growth_pct
FROM revenue_with_growth
ORDER BY month;

This pipeline: raw orders → monthly totals → month-over-month growth. Each step is a named, readable unit.

CTEs vs Subqueries vs Temp Tables

Feature	CTE	Subquery	Temp Table
Scope	Single query	Single query	Session-wide
Reusability	Multiple references in same query	Must repeat	Multiple queries
Readability	High (named, top-down)	Low (nested)	Medium
Performance	Usually same as subquery	Same as CTE	Materialized (stored)
Recursion	Yes	No	No
Indexable	No	No	Yes

Rule of thumb: Use CTEs for readability and when you reference the same logic multiple times. Use temp tables when you need indexes on intermediate results or need the data across multiple queries.

Window Functions

Window functions perform calculations across a set of rows that are related to the current row — without collapsing them into a single output row like GROUP BY does. That's the key difference.

GROUP BY vs Window Functions

-- GROUP BY: collapses rows — one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

department	avg_salary
Executive	250000
Engineering	146000
Sales	121750

-- Window function: keeps all rows, adds avg alongside each
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;

name	department	salary	dept_avg_salary
Sarah Chen	Executive	250000	250000
Mike Ross	Engineering	200000	146000
Tom Wilson	Engineering	150000	146000
Anna Kim	Engineering	145000	146000
James Liu	Engineering	120000	146000
Emma Davis	Engineering	115000	146000
...	...	...	...

Every row is preserved. Each row sees the average of its own department. This is impossible with GROUP BY alone.

Window Function Syntax

function_name(...) OVER (
    [PARTITION BY column1, column2, ...]
    [ORDER BY column3, column4, ...]
    [frame_clause]
)

PARTITION BY: Divides rows into groups (like GROUP BY, but without collapsing)
ORDER BY: Defines the order within each partition
Frame clause: Defines which rows relative to the current row are included in the calculation

Ranking Functions

ROW_NUMBER()

Assigns a unique sequential number to each row within a partition. No ties — if two values are equal, one gets a higher number (arbitrary order unless you specify it).

SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees;

name	department	salary	rank_in_dept
Sarah Chen	Executive	250000	1
Mike Ross	Engineering	200000	1
Tom Wilson	Engineering	150000	2
Anna Kim	Engineering	145000	3
James Liu	Engineering	120000	4
Emma Davis	Engineering	115000	5
Lisa Park	Sales	190000	1
Chris Lee	Sales	130000	2
Nina Patel	Sales	85000	3
Oscar Brown	Sales	82000	4

Classic pattern — Top N per group:

-- Top 2 highest-paid employees per department
WITH ranked AS (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 2;

This is one of the most common SQL interview questions and one of the most common production patterns.

RANK() and DENSE_RANK()

RANK() gives the same number to ties but leaves gaps. DENSE_RANK() gives the same number to ties without gaps:

-- Imagine two employees with the same salary
SELECT
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,    -- 1, 2, 3, 4
    RANK()       OVER (ORDER BY salary DESC) AS rank,       -- 1, 2, 2, 4 (gap)
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank  -- 1, 2, 2, 3 (no gap)
FROM employees;

When to use which:

ROW_NUMBER(): When you need exactly N rows (pagination, top-N, deduplication)
RANK(): When ties matter and you need standard competition ranking (1st, 2nd, 2nd, 4th)
DENSE_RANK(): When ties matter and you want consecutive ranks (1st, 2nd, 2nd, 3rd)

LAG() and LEAD() — Accessing Adjacent Rows

LAG() looks at the previous row. LEAD() looks at the next row. Both require ORDER BY.

SELECT
    DATE_TRUNC('month', created_at) AS month,
    SUM(total) AS revenue,
    LAG(SUM(total))  OVER (ORDER BY DATE_TRUNC('month', created_at)) AS prev_month,
    LEAD(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS next_month
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

month	revenue	prev_month	next_month
2025-01-01	848.00	NULL	1498.00
2025-02-01	1498.00	848.00	1648.00
2025-03-01	1648.00	1498.00	2827.00
2025-04-01	2827.00	1648.00	848.00
2025-05-01	848.00	2827.00	NULL

LAG and LEAD accept an optional offset (default 1) and default value:

-- Look back 2 rows, default to 0 if no row exists
LAG(revenue, 2, 0) OVER (ORDER BY month)

Running Totals and Moving Averages

Use aggregate functions with OVER to compute cumulative calculations:

-- Running total of revenue
SELECT
    DATE_TRUNC('month', created_at) AS month,
    SUM(total) AS monthly_revenue,
    SUM(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS running_total
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

month	monthly_revenue	running_total
2025-01-01	848.00	848.00
2025-02-01	1498.00	2346.00
2025-03-01	1648.00	3994.00
2025-04-01	2827.00	6821.00
2025-05-01	848.00	7669.00

Note the SUM(SUM(total)) — the inner SUM is the GROUP BY aggregate, the outer SUM ... OVER is the window function.

3-month moving average:

SELECT
    month,
    revenue,
    AVG(revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM (
    SELECT
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', created_at)
) AS monthly;

NTILE() — Bucketing Rows

NTILE(n) divides rows into n roughly equal groups:

-- Divide employees into 4 salary quartiles
SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;

Useful for percentile analysis, A/B test group assignment, or splitting work into batches.

FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()

Access specific rows within a window:

-- Each employee compared to the highest-paid in their department
SELECT
    name,
    department,
    salary,
    FIRST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS highest_paid_name,
    FIRST_VALUE(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS highest_salary,
    salary - FIRST_VALUE(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS diff_from_top
FROM employees;

Warning about LAST_VALUE: By default, the window frame ends at the current row, not the last row of the partition. To get the actual last value, you need an explicit frame:

LAST_VALUE(salary) OVER (
    PARTITION BY department ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_salary

Frame Specifications

The frame clause controls which rows the window function "sees" relative to the current row.

ROWS BETWEEN <start> AND <end>

Where <start> and <end> can be:

UNBOUNDED PRECEDING — from the first row of the partition
N PRECEDING — N rows before current
CURRENT ROW — the current row
N FOLLOWING — N rows after current
UNBOUNDED FOLLOWING — to the last row of the partition

Default frames:

With ORDER BY: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (running total)
Without ORDER BY: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (entire partition)

ROWS vs RANGE vs GROUPS:

ROWS: Physical rows. Precise count.
RANGE: Logical range based on ORDER BY value. Rows with the same value are treated as peers.
GROUPS: Groups of peer rows (PostgreSQL 11+). 1 PRECEDING means the previous group of tied values.

-- Running total (default frame with ORDER BY)
SUM(revenue) OVER (ORDER BY month)
 
-- 3-row moving average
AVG(revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
 
-- Full partition (explicit — useful for ratios)
SUM(salary) OVER (PARTITION BY department) AS dept_total
 
-- Centered 5-row window
AVG(value) OVER (ORDER BY ts ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)

Percentage of Total — A Classic Pattern

-- Each employee's salary as a percentage of their department's total
SELECT
    name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    ROUND(salary * 100.0 / SUM(salary) OVER (PARTITION BY department), 1) AS pct_of_dept
FROM employees
ORDER BY department, salary DESC;

Recursive CTEs

Recursive CTEs let SQL traverse hierarchical and graph-structured data. They're composed of two parts joined by UNION ALL:

Anchor member: The starting point (base case)
Recursive member: References the CTE itself (recursive step)

Syntax

WITH RECURSIVE cte_name AS (
    -- Anchor: base case
    SELECT ...
    WHERE <starting condition>
 
    UNION ALL
 
    -- Recursive: references cte_name
    SELECT ...
    FROM cte_name
    JOIN other_table ON ...
)
SELECT * FROM cte_name;

Example: Org Chart Traversal

Find the complete reporting chain for every employee:

WITH RECURSIVE org_chart AS (
    -- Anchor: start with the CEO (no manager)
    SELECT
        id,
        name,
        title,
        manager_id,
        0 AS depth,
        name AS path
    FROM employees
    WHERE manager_id IS NULL
 
    UNION ALL
 
    -- Recursive: find direct reports of current level
    SELECT
        e.id,
        e.name,
        e.title,
        e.manager_id,
        oc.depth + 1,
        oc.path || ' → ' || e.name
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT depth, name, title, path
FROM org_chart
ORDER BY path;

depth	name	title	path
0	Sarah Chen	CEO	Sarah Chen
1	Lisa Park	VP Sales	Sarah Chen → Lisa Park
2	Chris Lee	Sales Manager	Sarah Chen → Lisa Park → Chris Lee
3	Nina Patel	Sales Rep	Sarah Chen → Lisa Park → Chris Lee → Nina Patel
3	Oscar Brown	Sales Rep	Sarah Chen → Lisa Park → Chris Lee → Oscar Brown
1	Mike Ross	VP Engineering	Sarah Chen → Mike Ross
2	Tom Wilson	Senior Engineer	Sarah Chen → Mike Ross → Tom Wilson
3	James Liu	Engineer	Sarah Chen → Mike Ross → Tom Wilson → James Liu
3	Emma Davis	Engineer	Sarah Chen → Mike Ross → Tom Wilson → Emma Davis
2	Anna Kim	Senior Engineer	Sarah Chen → Mike Ross → Anna Kim

Example: Category Tree

List all categories with their full path:

WITH RECURSIVE category_tree AS (
    -- Anchor: root categories (no parent)
    SELECT
        id,
        name,
        parent_id,
        0 AS depth,
        name AS full_path
    FROM categories
    WHERE parent_id IS NULL
 
    UNION ALL
 
    SELECT
        c.id,
        c.name,
        c.parent_id,
        ct.depth + 1,
        ct.full_path || ' > ' || c.name
    FROM categories c
    INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT
    REPEAT('  ', depth) || name AS indented_name,
    full_path
FROM category_tree
ORDER BY full_path;

indented_name	full_path
All Products	All Products
Electronics	All Products > Electronics
Computers	All Products > Electronics > Computers
Desktops	All Products > Electronics > Computers > Desktops
Laptops	All Products > Electronics > Computers > Laptops
Phones	All Products > Electronics > Phones
Furniture	All Products > Furniture
Chairs	All Products > Furniture > Chairs
Desks	All Products > Furniture > Desks

Finding All Descendants

"Give me all sub-categories under Electronics":

WITH RECURSIVE descendants AS (
    SELECT id, name, parent_id
    FROM categories
    WHERE name = 'Electronics'  -- starting point
 
    UNION ALL
 
    SELECT c.id, c.name, c.parent_id
    FROM categories c
    INNER JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants;

Cycle Detection

If your data might contain cycles (A → B → C → A), the recursive CTE will run forever. PostgreSQL 14+ has built-in cycle detection:

WITH RECURSIVE traversal AS (
    SELECT id, name, manager_id, ARRAY[id] AS visited
    FROM employees
    WHERE manager_id IS NULL
 
    UNION ALL
 
    SELECT e.id, e.name, e.manager_id, t.visited || e.id
    FROM employees e
    JOIN traversal t ON e.manager_id = t.id
    WHERE e.id != ALL(t.visited)  -- prevent cycles
)
SELECT * FROM traversal;

PostgreSQL 14+ also supports the CYCLE clause:

WITH RECURSIVE traversal AS (
    ...
) CYCLE id SET is_cycle USING path

CASE Expressions and Conditional Logic

CASE adds if/else logic directly in SQL:

-- Categorize orders by size
SELECT
    id,
    total,
    CASE
        WHEN total >= 1000 THEN 'Large'
        WHEN total >= 500  THEN 'Medium'
        WHEN total >= 100  THEN 'Small'
        ELSE 'Micro'
    END AS order_size
FROM orders;

CASE in Aggregation (Pivot Pattern)

-- Order count by status — pivoted into columns
SELECT
    DATE_TRUNC('month', created_at) AS month,
    COUNT(*) AS total,
    COUNT(*) FILTER (WHERE status = 'completed') AS completed,
    COUNT(*) FILTER (WHERE status = 'pending')   AS pending,
    COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

If your database doesn't support FILTER, use CASE:

SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed

COALESCE, NULLIF, GREATEST, LEAST

-- COALESCE: first non-NULL value
SELECT COALESCE(nickname, name, 'Anonymous') AS display_name FROM users;
 
-- NULLIF: returns NULL if two values are equal (prevents division by zero)
SELECT revenue / NULLIF(order_count, 0) AS avg_order_value FROM metrics;
 
-- GREATEST / LEAST: max/min across columns (not rows)
SELECT GREATEST(score_a, score_b, score_c) AS best_score FROM exams;
SELECT LEAST(price, sale_price, clearance_price) AS lowest_price FROM products;

Essential Functions

String Functions

-- Concatenation
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users;
SELECT first_name || ' ' || last_name AS full_name FROM users;  -- PostgreSQL
 
-- Case conversion
SELECT UPPER('hello'), LOWER('HELLO'), INITCAP('hello world');
-- HELLO, hello, Hello World
 
-- Substring and position
SELECT SUBSTRING('PostgreSQL' FROM 1 FOR 4);  -- Post
SELECT POSITION('SQL' IN 'PostgreSQL');        -- 8
SELECT LENGTH('Hello');                        -- 5
 
-- Trim
SELECT TRIM('  hello  ');              -- 'hello'
SELECT LTRIM('  hello');               -- 'hello'
SELECT RTRIM('hello  ');               -- 'hello'
SELECT TRIM(LEADING '0' FROM '00042'); -- '42'
 
-- Replace and split
SELECT REPLACE('2025-01-15', '-', '/');        -- 2025/01/15
SELECT SPLIT_PART('john@example.com', '@', 2); -- example.com
 
-- Pattern matching with REGEXP
SELECT 'abc123' ~ '^[a-z]+[0-9]+$';  -- true (PostgreSQL regex)
SELECT REGEXP_REPLACE('Phone: 123-456-7890', '[^0-9]', '', 'g');  -- 1234567890

Date/Time Functions

-- Current timestamp
SELECT NOW(), CURRENT_DATE, CURRENT_TIME;
 
-- Truncate to a time unit
SELECT DATE_TRUNC('month', TIMESTAMP '2025-03-15 14:30:00');  -- 2025-03-01 00:00:00
SELECT DATE_TRUNC('week',  TIMESTAMP '2025-03-15 14:30:00');  -- 2025-03-10 00:00:00
 
-- Extract parts
SELECT EXTRACT(YEAR  FROM TIMESTAMP '2025-03-15');  -- 2025
SELECT EXTRACT(MONTH FROM TIMESTAMP '2025-03-15');  -- 3
SELECT EXTRACT(DOW   FROM TIMESTAMP '2025-03-15');  -- 6 (Saturday, 0=Sunday)
 
-- Age and intervals
SELECT AGE(TIMESTAMP '2025-01-01', TIMESTAMP '2020-06-15');  -- 4 years 6 mons 16 days
SELECT NOW() - INTERVAL '30 days';  -- 30 days ago
SELECT NOW() + INTERVAL '1 hour';   -- 1 hour from now
 
-- Generate a series of dates (useful for filling gaps)
SELECT generate_series(
    '2025-01-01'::date,
    '2025-12-01'::date,
    '1 month'::interval
) AS month;

Date arithmetic pattern — filling date gaps:

-- Revenue by month, including months with $0
WITH months AS (
    SELECT generate_series(
        '2025-01-01'::date,
        '2025-12-01'::date,
        '1 month'::interval
    )::date AS month
)
SELECT
    m.month,
    COALESCE(SUM(o.total), 0) AS revenue
FROM months m
LEFT JOIN orders o
    ON DATE_TRUNC('month', o.created_at)::date = m.month
    AND o.status = 'completed'
GROUP BY m.month
ORDER BY m.month;

Hands-On: Analytics Dashboard Queries

Let's put it all together with queries you'd actually build for a real analytics dashboard.

1. Revenue Trend with Month-over-Month Growth

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue,
        COUNT(*) AS order_count
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
    month,
    revenue,
    order_count,
    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
        1
    ) AS growth_pct,
    SUM(revenue) OVER (ORDER BY month) AS ytd_revenue
FROM monthly
ORDER BY month;

2. Customer Cohort Retention

Which month did each customer first order, and how many returned in subsequent months?

WITH first_order AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', MIN(created_at)) AS cohort_month
    FROM orders
    GROUP BY customer_id
),
activity AS (
    SELECT
        f.cohort_month,
        DATE_TRUNC('month', o.created_at) AS activity_month,
        COUNT(DISTINCT o.customer_id) AS active_customers
    FROM orders o
    JOIN first_order f ON f.customer_id = o.customer_id
    GROUP BY f.cohort_month, DATE_TRUNC('month', o.created_at)
)
SELECT
    cohort_month,
    activity_month,
    EXTRACT(MONTH FROM AGE(activity_month, cohort_month)) AS months_since_first,
    active_customers
FROM activity
ORDER BY cohort_month, activity_month;

3. Employee Salary Distribution with Percentiles

SELECT
    name,
    department,
    salary,
    ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
    ROUND(salary - AVG(salary) OVER (PARTITION BY department), 0) AS diff_from_avg,
    NTILE(4) OVER (ORDER BY salary) AS company_quartile,
    PERCENT_RANK() OVER (ORDER BY salary) AS percentile_rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
ORDER BY department, salary DESC;

4. Top Product per Customer

What's each customer's most purchased product?

WITH customer_products AS (
    SELECT
        c.name AS customer,
        p.name AS product,
        SUM(oi.quantity) AS total_qty,
        ROW_NUMBER() OVER (
            PARTITION BY c.id
            ORDER BY SUM(oi.quantity) DESC
        ) AS rn
    FROM customers c
    JOIN orders o ON o.customer_id = c.id
    JOIN order_items oi ON oi.order_id = o.id
    JOIN products p ON p.id = oi.product_id
    GROUP BY c.id, c.name, p.id, p.name
)
SELECT customer, product, total_qty
FROM customer_products
WHERE rn = 1;

5. Leaderboard with Running Rankings

WITH daily_sales AS (
    SELECT
        DATE_TRUNC('day', o.created_at) AS day,
        c.name AS customer,
        SUM(o.total) AS daily_total
    FROM orders o
    JOIN customers c ON c.id = o.customer_id
    WHERE o.status = 'completed'
    GROUP BY DATE_TRUNC('day', o.created_at), c.id, c.name
)
SELECT
    day,
    customer,
    daily_total,
    SUM(daily_total) OVER (PARTITION BY customer ORDER BY day) AS cumulative_total,
    RANK() OVER (
        PARTITION BY day
        ORDER BY daily_total DESC
    ) AS daily_rank
FROM daily_sales
ORDER BY day, daily_rank;

Performance Considerations

Window Functions

Window functions run after WHERE, GROUP BY, and HAVING — they cannot be used in WHERE clauses
To filter by window function results, wrap in a CTE or subquery:

-- WRONG: cannot use window function in WHERE
SELECT *, ROW_NUMBER() OVER (...) AS rn
FROM employees
WHERE rn <= 5;  -- ERROR
 
-- CORRECT: wrap in CTE
WITH ranked AS (
    SELECT *, ROW_NUMBER() OVER (...) AS rn
    FROM employees
)
SELECT * FROM ranked WHERE rn <= 5;

Multiple window functions with the same OVER clause are computed in a single pass. Use a named window to keep it DRY:

SELECT
    name,
    salary,
    ROW_NUMBER() OVER w AS row_num,
    RANK()       OVER w AS rank,
    DENSE_RANK() OVER w AS dense_rank
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

CTEs

In PostgreSQL 12+, CTEs are inlined (optimized like subqueries) by default. Before PostgreSQL 12, CTEs were always materialized (optimization fence)
Force materialization when you want to prevent the optimizer from re-executing the CTE:

WITH expensive_calc AS MATERIALIZED (
    SELECT ... -- complex calculation
)
SELECT * FROM expensive_calc;

Recursive CTEs

Always include a termination condition to prevent infinite loops
Use UNION ALL (not UNION) unless you specifically need deduplication — it's faster
Set a safety limit: WHERE depth < 100 or PostgreSQL's max_recursive_iterations

Exercises

Use the sample schema and solve these:

Running total: Show each order with a running total of revenue per customer, ordered by date.
Rank by department: Rank all employees by salary within their department. Show the top 2 per department.
Month-over-month: Calculate the percentage change in completed order revenue from month to month.
Category tree: Using the categories table, display all categories with their depth level and full path.
Employee chain: For each employee, show the full management chain from CEO down to them.
Moving average: Calculate a 3-month moving average of order revenue.
Percentile: Show each employee's salary percentile within the company (0.0 to 1.0).
Gap detection: Using generate_series and a LEFT JOIN, find months in 2025 that had no completed orders.

Summary and Key Takeaways

Learning Outcomes:
✅ CTEs break complex queries into named, readable steps — define once, reference multiple times
✅ Window functions compute across rows without collapsing them — unlike GROUP BY
✅ ROW_NUMBER, RANK, DENSE_RANK handle rankings; LAG/LEAD access adjacent rows
✅ Frame specifications (ROWS BETWEEN) control exactly which rows a window sees
✅ Recursive CTEs traverse hierarchical data: org charts, category trees, graph structures
✅ CASE, COALESCE, NULLIF add conditional logic directly in queries
✅ Date functions (DATE_TRUNC, EXTRACT, generate_series) power time-series analytics

What's Next

DB-4: NoSQL Fundamentals — Document, Key-Value, Column & Graph →

Now that you've mastered SQL from fundamentals through advanced analytics, it's time to explore the other side: NoSQL databases. You'll learn when documents, key-value stores, column families, and graphs genuinely outperform relational databases — and when they don't.

Series: SQL & NoSQL Database Roadmap
Previous: DB-2: SQL Fundamentals — SELECT, JOIN & Subqueries
Next: DB-4: NoSQL Fundamentals — Document, Key-Value, Column & Graph

Introduction

What You'll Learn

The Sample Schema (Extended)

Common Table Expressions (CTEs)

Basic CTE Syntax

Multiple CTEs

CTEs vs Subqueries vs Temp Tables

Window Functions

GROUP BY vs Window Functions

Window Function Syntax

Ranking Functions

ROW_NUMBER()

RANK() and DENSE_RANK()

LAG() and LEAD() — Accessing Adjacent Rows

Running Totals and Moving Averages

NTILE() — Bucketing Rows

FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()

Frame Specifications

Percentage of Total — A Classic Pattern

Recursive CTEs

Syntax

Example: Org Chart Traversal

Example: Category Tree

Finding All Descendants

Cycle Detection

CASE Expressions and Conditional Logic

CASE in Aggregation (Pivot Pattern)

COALESCE, NULLIF, GREATEST, LEAST

Essential Functions

String Functions

Date/Time Functions

Date arithmetic pattern — filling date gaps:

Hands-On: Analytics Dashboard Queries

1. Revenue Trend with Month-over-Month Growth

2. Customer Cohort Retention

3. Employee Salary Distribution with Percentiles

4. Top Product per Customer

5. Leaderboard with Running Rankings

Performance Considerations

Window Functions

CTEs

Recursive CTEs

Exercises

Summary and Key Takeaways

What's Next

📬 Subscribe to Newsletter

💬 Comments