Relational Databases and SQL

Edgar F. Codd published the relational model in 1970. The databases it inspired — Postgres, MySQL, SQL Server, SQLite — are still the workhorses of structured data half a century later. That's because the relational model is really good at what it does! Data lives in tables of rows and columns. Tables relate to each other through keys. Queries are expressed in SQL, a language that is approachable enough for analysts and powerful enough for production systems.

Note: SQL will appear in technical interviews, so this is one to review!

The Four Operations: CRUD

Every data system is built on four fundamental operations — Create, Read, Update, Delete. In SQL:

-- Read
SELECT first_name, last_name FROM employees WHERE department = 'sales';

-- Create
INSERT INTO employees (first_name, last_name, department, salary)
VALUES ('Jordan', 'Lee', 'marketing', 72000);

-- Update
UPDATE employees SET salary = salary * 1.1 WHERE department = 'sales';

-- Delete
DELETE FROM employees WHERE department = 'HR';

The operation that trips up most people is JOIN — combining rows from two or more tables based on a related column. The mental model is set theory, but you don't need to think in sets to use joins fluently. Think instead about what rows you want in your result.

The Four SQL Join Types
Intersection Only
INNER JOIN
Returns only rows where a match exists in both tables.
If a row in the left table has no corresponding row in the right table (or vice versa), it's excluded from the result. Silent drops are the most common source of analytics bugs.
SELECT * FROM a
INNER JOIN b ON a.id = b.a_id
When to use
Use when every row must match
All Left Rows
LEFT JOIN
Returns every row from the left table, plus matched rows from the right.
When no match exists in the right table, its columns appear as NULL. The most-used join in analytics — "give me everything from A, and whatever B has on it."
SELECT * FROM a
LEFT JOIN b ON a.id = b.a_id
When to use
Use for counts that include zero
All Right Rows
RIGHT JOIN
Returns every row from the right table, plus matched rows from the left.
Mirror of LEFT JOIN. Rarely used in practice — most engineers swap the table order and write a LEFT JOIN instead. Same result, more readable.
SELECT * FROM a
RIGHT JOIN b ON a.id = b.a_id
When to use
Usually replaced by LEFT JOIN
All Rows, Both Sides
FULL OUTER JOIN
Returns all rows from both tables, matched where possible.
Unmatched rows from either side get NULLs on the other side. Best for finding discrepancies — rows that exist in one dataset but not the other.
SELECT * FROM a
FULL OUTER JOIN b ON a.id = b.a_id
When to use
Use for reconciliation queries

When in doubt, start with LEFT JOIN. It preserves all rows from the primary table and makes missing matches visible as NULLs — silent row drops are the most common source of analytics errors.

SQL Join Sandboxcustomers (5 rows) × orders (6 rows)

Select a join type to see which rows appear:

SELECT c.id, c.name, c.city,
       o.id AS order_id, o.amount, o.product
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;
Result5 rowsINNER JOIN
c.idc.namec.cityorder_idamountproduct
1Alice ChenDurham101149.99Laptop Stand
1Alice ChenDurham10229.99USB Hub
2Ben MoraRaleigh103399Monitor
3Carmen LiChapel Hill10489.5Keyboard
3Carmen LiChapel Hill10555Mouse
Matched rowNULL (no match)Eva Torres has no orders — watch her appear/disappear

Write SQL queries against a live in-browser database. Try each join type and watch which rows appear and disappear.

When Relational Databases Fit (and When They Don't)

Relational databases excel at: structured data with a consistent schema, complex queries with multiple joins, transactional workloads where consistency matters (financial records, order systems), and moderate data volumes.

They start to strain at: schema-flexible documents where every row looks different, extremely high write throughput, data too large to fit on a single machine without careful partitioning, and highly connected data (social graphs, fraud networks) where joins across many tables become expensive.

Checkpoint

You want a list of all customers and how many orders each has placed — including customers who have placed zero orders. Which SQL approach returns the correct result?