Advanced SQL: Window Functions, CTEs & Recursive Queries

Introduction
In DB-2: SQL Fundamentals, you learned SELECT, JOINs, GROUP BY, and subqueries. Those tools handle most day-to-day queries. But the moment someone asks "rank customers by spend within each region" or "show a running total of revenue over time" or "traverse a category tree" — basic SQL hits a wall.
That's where advanced SQL comes in. Window functions let you compute rankings, running totals, and moving averages without collapsing rows into groups. CTEs make complex queries readable and composable. Recursive CTEs traverse hierarchical data — org charts, category trees, comment threads — entirely in SQL.
These aren't niche features. They're what separates a developer who can write SQL from one who can think in SQL. And they show up constantly in interviews, analytics dashboards, and production reporting.
Prerequisites: This post is part of the SQL & NoSQL Database Roadmap. Complete DB-2: SQL Fundamentals first.
What You'll Learn
✅ Write CTEs to break complex queries into readable steps
✅ Use window functions for rankings, running totals, and moving averages
✅ Understand PARTITION BY, ORDER BY, and frame specifications
✅ Traverse hierarchical data with recursive CTEs
✅ Apply CASE, COALESCE, and conditional logic in queries
✅ Use string, date, and math functions for real-world transformations
✅ Build analytics dashboard queries from scratch
The Sample Schema (Extended)
We'll reuse the e-commerce schema from DB-2 and add a few tables for analytics and hierarchy examples:
-- Reuse from DB-2: customers, products, orders, order_items
-- Add an employees table for hierarchy examples
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
title VARCHAR(100) NOT NULL,
department VARCHAR(50) NOT NULL,
salary NUMERIC(10, 2) NOT NULL,
manager_id INT REFERENCES employees(id),
hired_at DATE NOT NULL
);
INSERT INTO employees (name, title, department, salary, manager_id, hired_at) VALUES
('Sarah Chen', 'CEO', 'Executive', 250000, NULL, '2018-01-15'),
('Mike Ross', 'VP Engineering', 'Engineering', 200000, 1, '2018-06-01'),
('Lisa Park', 'VP Sales', 'Sales', 190000, 1, '2019-03-10'),
('Tom Wilson', 'Senior Engineer', 'Engineering', 150000, 2, '2020-01-20'),
('Anna Kim', 'Senior Engineer', 'Engineering', 145000, 2, '2020-04-15'),
('James Liu', 'Engineer', 'Engineering', 120000, 4, '2021-07-01'),
('Emma Davis', 'Engineer', 'Engineering', 115000, 4, '2021-09-15'),
('Chris Lee', 'Sales Manager', 'Sales', 130000, 3, '2020-02-01'),
('Nina Patel', 'Sales Rep', 'Sales', 85000, 8, '2022-01-10'),
('Oscar Brown', 'Sales Rep', 'Sales', 82000, 8, '2022-06-15');
-- Add a categories table for recursive hierarchy
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT REFERENCES categories(id)
);
INSERT INTO categories (name, parent_id) VALUES
('All Products', NULL),
('Electronics', 1),
('Computers', 2),
('Laptops', 3),
('Desktops', 3),
('Phones', 2),
('Furniture', 1),
('Desks', 7),
('Chairs', 7);We'll also add more orders with dates spread across months for time-series analytics:
-- Additional orders for time-series analysis (add to existing orders)
INSERT INTO orders (customer_id, total, status, created_at) VALUES
(1, 299.00, 'completed', '2025-01-15'),
(2, 549.00, 'completed', '2025-01-22'),
(1, 1299.00, 'completed', '2025-02-03'),
(3, 199.00, 'completed', '2025-02-18'),
(2, 899.00, 'completed', '2025-03-05'),
(4, 459.00, 'pending', '2025-03-12'),
(1, 749.00, 'completed', '2025-03-20'),
(3, 1599.00, 'completed', '2025-04-01'),
(2, 329.00, 'completed', '2025-04-15'),
(4, 899.00, 'completed', '2025-04-22'),
(1, 199.00, 'completed', '2025-05-10'),
(3, 649.00, 'completed', '2025-05-18');Common Table Expressions (CTEs)
A CTE is a named temporary result set that exists only for the duration of a single query. Think of it as a disposable view — you define it with WITH, then reference it by name.
Basic CTE Syntax
WITH cte_name AS (
SELECT ...
FROM ...
)
SELECT *
FROM cte_name;Here's a real example. Without a CTE, finding customers who spend more than average requires a subquery:
-- Subquery version (hard to read as complexity grows)
SELECT c.name, customer_totals.total_spent
FROM customers c
JOIN (
SELECT customer_id, SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
) AS customer_totals ON customer_totals.customer_id = c.id
WHERE customer_totals.total_spent > (
SELECT AVG(total_spent)
FROM (
SELECT SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
) AS avg_calc
);With a CTE, the same logic is clearer:
WITH customer_spending AS (
SELECT customer_id, SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
)
SELECT c.name, cs.total_spent
FROM customers c
JOIN customer_spending cs ON cs.customer_id = c.id
WHERE cs.total_spent > (SELECT AVG(total_spent) FROM customer_spending);The CTE is defined once and referenced twice — no repeated logic, no nested subqueries.
Multiple CTEs
Chain multiple CTEs separated by commas. Each can reference the ones defined before it:
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
),
revenue_with_growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_revenue
)
SELECT
month,
revenue,
prev_month_revenue,
growth,
CASE
WHEN prev_month_revenue > 0
THEN ROUND(growth / prev_month_revenue * 100, 1)
ELSE NULL
END AS growth_pct
FROM revenue_with_growth
ORDER BY month;This pipeline: raw orders → monthly totals → month-over-month growth. Each step is a named, readable unit.
CTEs vs Subqueries vs Temp Tables
| Feature | CTE | Subquery | Temp Table |
|---|---|---|---|
| Scope | Single query | Single query | Session-wide |
| Reusability | Multiple references in same query | Must repeat | Multiple queries |
| Readability | High (named, top-down) | Low (nested) | Medium |
| Performance | Usually same as subquery | Same as CTE | Materialized (stored) |
| Recursion | Yes | No | No |
| Indexable | No | No | Yes |
Rule of thumb: Use CTEs for readability and when you reference the same logic multiple times. Use temp tables when you need indexes on intermediate results or need the data across multiple queries.
Window Functions
Window functions perform calculations across a set of rows that are related to the current row — without collapsing them into a single output row like GROUP BY does. That's the key difference.
GROUP BY vs Window Functions
-- GROUP BY: collapses rows — one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;| department | avg_salary |
|---|---|
| Executive | 250000 |
| Engineering | 146000 |
| Sales | 121750 |
-- Window function: keeps all rows, adds avg alongside each
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;| name | department | salary | dept_avg_salary |
|---|---|---|---|
| Sarah Chen | Executive | 250000 | 250000 |
| Mike Ross | Engineering | 200000 | 146000 |
| Tom Wilson | Engineering | 150000 | 146000 |
| Anna Kim | Engineering | 145000 | 146000 |
| James Liu | Engineering | 120000 | 146000 |
| Emma Davis | Engineering | 115000 | 146000 |
| ... | ... | ... | ... |
Every row is preserved. Each row sees the average of its own department. This is impossible with GROUP BY alone.
Window Function Syntax
function_name(...) OVER (
[PARTITION BY column1, column2, ...]
[ORDER BY column3, column4, ...]
[frame_clause]
)- PARTITION BY: Divides rows into groups (like GROUP BY, but without collapsing)
- ORDER BY: Defines the order within each partition
- Frame clause: Defines which rows relative to the current row are included in the calculation
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential number to each row within a partition. No ties — if two values are equal, one gets a higher number (arbitrary order unless you specify it).
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees;| name | department | salary | rank_in_dept |
|---|---|---|---|
| Sarah Chen | Executive | 250000 | 1 |
| Mike Ross | Engineering | 200000 | 1 |
| Tom Wilson | Engineering | 150000 | 2 |
| Anna Kim | Engineering | 145000 | 3 |
| James Liu | Engineering | 120000 | 4 |
| Emma Davis | Engineering | 115000 | 5 |
| Lisa Park | Sales | 190000 | 1 |
| Chris Lee | Sales | 130000 | 2 |
| Nina Patel | Sales | 85000 | 3 |
| Oscar Brown | Sales | 82000 | 4 |
Classic pattern — Top N per group:
-- Top 2 highest-paid employees per department
WITH ranked AS (
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 2;This is one of the most common SQL interview questions and one of the most common production patterns.
RANK() and DENSE_RANK()
RANK() gives the same number to ties but leaves gaps. DENSE_RANK() gives the same number to ties without gaps:
-- Imagine two employees with the same salary
SELECT
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, -- 1, 2, 3, 4
RANK() OVER (ORDER BY salary DESC) AS rank, -- 1, 2, 2, 4 (gap)
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank -- 1, 2, 2, 3 (no gap)
FROM employees;When to use which:
ROW_NUMBER(): When you need exactly N rows (pagination, top-N, deduplication)RANK(): When ties matter and you need standard competition ranking (1st, 2nd, 2nd, 4th)DENSE_RANK(): When ties matter and you want consecutive ranks (1st, 2nd, 2nd, 3rd)
LAG() and LEAD() — Accessing Adjacent Rows
LAG() looks at the previous row. LEAD() looks at the next row. Both require ORDER BY.
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue,
LAG(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS prev_month,
LEAD(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS next_month
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;| month | revenue | prev_month | next_month |
|---|---|---|---|
| 2025-01-01 | 848.00 | NULL | 1498.00 |
| 2025-02-01 | 1498.00 | 848.00 | 1648.00 |
| 2025-03-01 | 1648.00 | 1498.00 | 2827.00 |
| 2025-04-01 | 2827.00 | 1648.00 | 848.00 |
| 2025-05-01 | 848.00 | 2827.00 | NULL |
LAG and LEAD accept an optional offset (default 1) and default value:
-- Look back 2 rows, default to 0 if no row exists
LAG(revenue, 2, 0) OVER (ORDER BY month)Running Totals and Moving Averages
Use aggregate functions with OVER to compute cumulative calculations:
-- Running total of revenue
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS monthly_revenue,
SUM(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS running_total
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;| month | monthly_revenue | running_total |
|---|---|---|
| 2025-01-01 | 848.00 | 848.00 |
| 2025-02-01 | 1498.00 | 2346.00 |
| 2025-03-01 | 1648.00 | 3994.00 |
| 2025-04-01 | 2827.00 | 6821.00 |
| 2025-05-01 | 848.00 | 7669.00 |
Note the SUM(SUM(total)) — the inner SUM is the GROUP BY aggregate, the outer SUM ... OVER is the window function.
3-month moving average:
SELECT
month,
revenue,
AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3m
FROM (
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
) AS monthly;NTILE() — Bucketing Rows
NTILE(n) divides rows into n roughly equal groups:
-- Divide employees into 4 salary quartiles
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;Useful for percentile analysis, A/B test group assignment, or splitting work into batches.
FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()
Access specific rows within a window:
-- Each employee compared to the highest-paid in their department
SELECT
name,
department,
salary,
FIRST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
) AS highest_paid_name,
FIRST_VALUE(salary) OVER (
PARTITION BY department ORDER BY salary DESC
) AS highest_salary,
salary - FIRST_VALUE(salary) OVER (
PARTITION BY department ORDER BY salary DESC
) AS diff_from_top
FROM employees;Warning about LAST_VALUE: By default, the window frame ends at the current row, not the last row of the partition. To get the actual last value, you need an explicit frame:
LAST_VALUE(salary) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_salaryFrame Specifications
The frame clause controls which rows the window function "sees" relative to the current row.
ROWS BETWEEN <start> AND <end>Where <start> and <end> can be:
UNBOUNDED PRECEDING— from the first row of the partitionN PRECEDING— N rows before currentCURRENT ROW— the current rowN FOLLOWING— N rows after currentUNBOUNDED FOLLOWING— to the last row of the partition
Default frames:
- With
ORDER BY:RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(running total) - Without
ORDER BY:RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING(entire partition)
ROWS vs RANGE vs GROUPS:
ROWS: Physical rows. Precise count.RANGE: Logical range based on ORDER BY value. Rows with the same value are treated as peers.GROUPS: Groups of peer rows (PostgreSQL 11+).1 PRECEDINGmeans the previous group of tied values.
-- Running total (default frame with ORDER BY)
SUM(revenue) OVER (ORDER BY month)
-- 3-row moving average
AVG(revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- Full partition (explicit — useful for ratios)
SUM(salary) OVER (PARTITION BY department) AS dept_total
-- Centered 5-row window
AVG(value) OVER (ORDER BY ts ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)Percentage of Total — A Classic Pattern
-- Each employee's salary as a percentage of their department's total
SELECT
name,
department,
salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
ROUND(salary * 100.0 / SUM(salary) OVER (PARTITION BY department), 1) AS pct_of_dept
FROM employees
ORDER BY department, salary DESC;Recursive CTEs
Recursive CTEs let SQL traverse hierarchical and graph-structured data. They're composed of two parts joined by UNION ALL:
- Anchor member: The starting point (base case)
- Recursive member: References the CTE itself (recursive step)
Syntax
WITH RECURSIVE cte_name AS (
-- Anchor: base case
SELECT ...
WHERE <starting condition>
UNION ALL
-- Recursive: references cte_name
SELECT ...
FROM cte_name
JOIN other_table ON ...
)
SELECT * FROM cte_name;Example: Org Chart Traversal
Find the complete reporting chain for every employee:
WITH RECURSIVE org_chart AS (
-- Anchor: start with the CEO (no manager)
SELECT
id,
name,
title,
manager_id,
0 AS depth,
name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find direct reports of current level
SELECT
e.id,
e.name,
e.title,
e.manager_id,
oc.depth + 1,
oc.path || ' → ' || e.name
FROM employees e
INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT depth, name, title, path
FROM org_chart
ORDER BY path;| depth | name | title | path |
|---|---|---|---|
| 0 | Sarah Chen | CEO | Sarah Chen |
| 1 | Lisa Park | VP Sales | Sarah Chen → Lisa Park |
| 2 | Chris Lee | Sales Manager | Sarah Chen → Lisa Park → Chris Lee |
| 3 | Nina Patel | Sales Rep | Sarah Chen → Lisa Park → Chris Lee → Nina Patel |
| 3 | Oscar Brown | Sales Rep | Sarah Chen → Lisa Park → Chris Lee → Oscar Brown |
| 1 | Mike Ross | VP Engineering | Sarah Chen → Mike Ross |
| 2 | Tom Wilson | Senior Engineer | Sarah Chen → Mike Ross → Tom Wilson |
| 3 | James Liu | Engineer | Sarah Chen → Mike Ross → Tom Wilson → James Liu |
| 3 | Emma Davis | Engineer | Sarah Chen → Mike Ross → Tom Wilson → Emma Davis |
| 2 | Anna Kim | Senior Engineer | Sarah Chen → Mike Ross → Anna Kim |
Example: Category Tree
List all categories with their full path:
WITH RECURSIVE category_tree AS (
-- Anchor: root categories (no parent)
SELECT
id,
name,
parent_id,
0 AS depth,
name AS full_path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
ct.depth + 1,
ct.full_path || ' > ' || c.name
FROM categories c
INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT
REPEAT(' ', depth) || name AS indented_name,
full_path
FROM category_tree
ORDER BY full_path;| indented_name | full_path |
|---|---|
| All Products | All Products |
| Electronics | All Products > Electronics |
| Computers | All Products > Electronics > Computers |
| Desktops | All Products > Electronics > Computers > Desktops |
| Laptops | All Products > Electronics > Computers > Laptops |
| Phones | All Products > Electronics > Phones |
| Furniture | All Products > Furniture |
| Chairs | All Products > Furniture > Chairs |
| Desks | All Products > Furniture > Desks |
Finding All Descendants
"Give me all sub-categories under Electronics":
WITH RECURSIVE descendants AS (
SELECT id, name, parent_id
FROM categories
WHERE name = 'Electronics' -- starting point
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c
INNER JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants;Cycle Detection
If your data might contain cycles (A → B → C → A), the recursive CTE will run forever. PostgreSQL 14+ has built-in cycle detection:
WITH RECURSIVE traversal AS (
SELECT id, name, manager_id, ARRAY[id] AS visited
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, t.visited || e.id
FROM employees e
JOIN traversal t ON e.manager_id = t.id
WHERE e.id != ALL(t.visited) -- prevent cycles
)
SELECT * FROM traversal;PostgreSQL 14+ also supports the CYCLE clause:
WITH RECURSIVE traversal AS (
...
) CYCLE id SET is_cycle USING pathCASE Expressions and Conditional Logic
CASE adds if/else logic directly in SQL:
-- Categorize orders by size
SELECT
id,
total,
CASE
WHEN total >= 1000 THEN 'Large'
WHEN total >= 500 THEN 'Medium'
WHEN total >= 100 THEN 'Small'
ELSE 'Micro'
END AS order_size
FROM orders;CASE in Aggregation (Pivot Pattern)
-- Order count by status — pivoted into columns
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'completed') AS completed,
COUNT(*) FILTER (WHERE status = 'pending') AS pending,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;If your database doesn't support FILTER, use CASE:
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completedCOALESCE, NULLIF, GREATEST, LEAST
-- COALESCE: first non-NULL value
SELECT COALESCE(nickname, name, 'Anonymous') AS display_name FROM users;
-- NULLIF: returns NULL if two values are equal (prevents division by zero)
SELECT revenue / NULLIF(order_count, 0) AS avg_order_value FROM metrics;
-- GREATEST / LEAST: max/min across columns (not rows)
SELECT GREATEST(score_a, score_b, score_c) AS best_score FROM exams;
SELECT LEAST(price, sale_price, clearance_price) AS lowest_price FROM products;Essential Functions
String Functions
-- Concatenation
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users;
SELECT first_name || ' ' || last_name AS full_name FROM users; -- PostgreSQL
-- Case conversion
SELECT UPPER('hello'), LOWER('HELLO'), INITCAP('hello world');
-- HELLO, hello, Hello World
-- Substring and position
SELECT SUBSTRING('PostgreSQL' FROM 1 FOR 4); -- Post
SELECT POSITION('SQL' IN 'PostgreSQL'); -- 8
SELECT LENGTH('Hello'); -- 5
-- Trim
SELECT TRIM(' hello '); -- 'hello'
SELECT LTRIM(' hello'); -- 'hello'
SELECT RTRIM('hello '); -- 'hello'
SELECT TRIM(LEADING '0' FROM '00042'); -- '42'
-- Replace and split
SELECT REPLACE('2025-01-15', '-', '/'); -- 2025/01/15
SELECT SPLIT_PART('john@example.com', '@', 2); -- example.com
-- Pattern matching with REGEXP
SELECT 'abc123' ~ '^[a-z]+[0-9]+$'; -- true (PostgreSQL regex)
SELECT REGEXP_REPLACE('Phone: 123-456-7890', '[^0-9]', '', 'g'); -- 1234567890Date/Time Functions
-- Current timestamp
SELECT NOW(), CURRENT_DATE, CURRENT_TIME;
-- Truncate to a time unit
SELECT DATE_TRUNC('month', TIMESTAMP '2025-03-15 14:30:00'); -- 2025-03-01 00:00:00
SELECT DATE_TRUNC('week', TIMESTAMP '2025-03-15 14:30:00'); -- 2025-03-10 00:00:00
-- Extract parts
SELECT EXTRACT(YEAR FROM TIMESTAMP '2025-03-15'); -- 2025
SELECT EXTRACT(MONTH FROM TIMESTAMP '2025-03-15'); -- 3
SELECT EXTRACT(DOW FROM TIMESTAMP '2025-03-15'); -- 6 (Saturday, 0=Sunday)
-- Age and intervals
SELECT AGE(TIMESTAMP '2025-01-01', TIMESTAMP '2020-06-15'); -- 4 years 6 mons 16 days
SELECT NOW() - INTERVAL '30 days'; -- 30 days ago
SELECT NOW() + INTERVAL '1 hour'; -- 1 hour from now
-- Generate a series of dates (useful for filling gaps)
SELECT generate_series(
'2025-01-01'::date,
'2025-12-01'::date,
'1 month'::interval
) AS month;Date arithmetic pattern — filling date gaps:
-- Revenue by month, including months with $0
WITH months AS (
SELECT generate_series(
'2025-01-01'::date,
'2025-12-01'::date,
'1 month'::interval
)::date AS month
)
SELECT
m.month,
COALESCE(SUM(o.total), 0) AS revenue
FROM months m
LEFT JOIN orders o
ON DATE_TRUNC('month', o.created_at)::date = m.month
AND o.status = 'completed'
GROUP BY m.month
ORDER BY m.month;Hands-On: Analytics Dashboard Queries
Let's put it all together with queries you'd actually build for a real analytics dashboard.
1. Revenue Trend with Month-over-Month Growth
WITH monthly AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue,
COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
month,
revenue,
order_count,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
1
) AS growth_pct,
SUM(revenue) OVER (ORDER BY month) AS ytd_revenue
FROM monthly
ORDER BY month;2. Customer Cohort Retention
Which month did each customer first order, and how many returned in subsequent months?
WITH first_order AS (
SELECT
customer_id,
DATE_TRUNC('month', MIN(created_at)) AS cohort_month
FROM orders
GROUP BY customer_id
),
activity AS (
SELECT
f.cohort_month,
DATE_TRUNC('month', o.created_at) AS activity_month,
COUNT(DISTINCT o.customer_id) AS active_customers
FROM orders o
JOIN first_order f ON f.customer_id = o.customer_id
GROUP BY f.cohort_month, DATE_TRUNC('month', o.created_at)
)
SELECT
cohort_month,
activity_month,
EXTRACT(MONTH FROM AGE(activity_month, cohort_month)) AS months_since_first,
active_customers
FROM activity
ORDER BY cohort_month, activity_month;3. Employee Salary Distribution with Percentiles
SELECT
name,
department,
salary,
ROUND(AVG(salary) OVER (PARTITION BY department), 0) AS dept_avg,
ROUND(salary - AVG(salary) OVER (PARTITION BY department), 0) AS diff_from_avg,
NTILE(4) OVER (ORDER BY salary) AS company_quartile,
PERCENT_RANK() OVER (ORDER BY salary) AS percentile_rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
ORDER BY department, salary DESC;4. Top Product per Customer
What's each customer's most purchased product?
WITH customer_products AS (
SELECT
c.name AS customer,
p.name AS product,
SUM(oi.quantity) AS total_qty,
ROW_NUMBER() OVER (
PARTITION BY c.id
ORDER BY SUM(oi.quantity) DESC
) AS rn
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
GROUP BY c.id, c.name, p.id, p.name
)
SELECT customer, product, total_qty
FROM customer_products
WHERE rn = 1;5. Leaderboard with Running Rankings
WITH daily_sales AS (
SELECT
DATE_TRUNC('day', o.created_at) AS day,
c.name AS customer,
SUM(o.total) AS daily_total
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('day', o.created_at), c.id, c.name
)
SELECT
day,
customer,
daily_total,
SUM(daily_total) OVER (PARTITION BY customer ORDER BY day) AS cumulative_total,
RANK() OVER (
PARTITION BY day
ORDER BY daily_total DESC
) AS daily_rank
FROM daily_sales
ORDER BY day, daily_rank;Performance Considerations
Window Functions
- Window functions run after WHERE, GROUP BY, and HAVING — they cannot be used in WHERE clauses
- To filter by window function results, wrap in a CTE or subquery:
-- WRONG: cannot use window function in WHERE
SELECT *, ROW_NUMBER() OVER (...) AS rn
FROM employees
WHERE rn <= 5; -- ERROR
-- CORRECT: wrap in CTE
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (...) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 5;- Multiple window functions with the same
OVERclause are computed in a single pass. Use a named window to keep it DRY:
SELECT
name,
salary,
ROW_NUMBER() OVER w AS row_num,
RANK() OVER w AS rank,
DENSE_RANK() OVER w AS dense_rank
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);CTEs
- In PostgreSQL 12+, CTEs are inlined (optimized like subqueries) by default. Before PostgreSQL 12, CTEs were always materialized (optimization fence)
- Force materialization when you want to prevent the optimizer from re-executing the CTE:
WITH expensive_calc AS MATERIALIZED (
SELECT ... -- complex calculation
)
SELECT * FROM expensive_calc;Recursive CTEs
- Always include a termination condition to prevent infinite loops
- Use
UNION ALL(notUNION) unless you specifically need deduplication — it's faster - Set a safety limit:
WHERE depth < 100or PostgreSQL'smax_recursive_iterations
Exercises
Use the sample schema and solve these:
- Running total: Show each order with a running total of revenue per customer, ordered by date.
- Rank by department: Rank all employees by salary within their department. Show the top 2 per department.
- Month-over-month: Calculate the percentage change in completed order revenue from month to month.
- Category tree: Using the categories table, display all categories with their depth level and full path.
- Employee chain: For each employee, show the full management chain from CEO down to them.
- Moving average: Calculate a 3-month moving average of order revenue.
- Percentile: Show each employee's salary percentile within the company (0.0 to 1.0).
- Gap detection: Using
generate_seriesand a LEFT JOIN, find months in 2025 that had no completed orders.
Summary and Key Takeaways
Learning Outcomes:
✅ CTEs break complex queries into named, readable steps — define once, reference multiple times
✅ Window functions compute across rows without collapsing them — unlike GROUP BY
✅ ROW_NUMBER, RANK, DENSE_RANK handle rankings; LAG/LEAD access adjacent rows
✅ Frame specifications (ROWS BETWEEN) control exactly which rows a window sees
✅ Recursive CTEs traverse hierarchical data: org charts, category trees, graph structures
✅ CASE, COALESCE, NULLIF add conditional logic directly in queries
✅ Date functions (DATE_TRUNC, EXTRACT, generate_series) power time-series analytics
What's Next
DB-4: NoSQL Fundamentals — Document, Key-Value, Column & Graph →
Now that you've mastered SQL from fundamentals through advanced analytics, it's time to explore the other side: NoSQL databases. You'll learn when documents, key-value stores, column families, and graphs genuinely outperform relational databases — and when they don't.
Series: SQL & NoSQL Database Roadmap
Previous: DB-2: SQL Fundamentals — SELECT, JOIN & Subqueries
Next: DB-4: NoSQL Fundamentals — Document, Key-Value, Column & Graph
📬 Subscribe to Newsletter
Get the latest blog posts delivered to your inbox every week. No spam, unsubscribe anytime.
We respect your privacy. Unsubscribe at any time.
💬 Comments
Sign in to leave a comment
We'll never post without your permission.