Relational Databases and SQL

Edgar F. Codd published the relational model in 1970. The databases it inspired — Postgres, MySQL, SQL Server, SQLite — are still the workhorses of structured data half a century later. That's because the relational model is really good at what it does! Data lives in tables of rows and columns. Tables relate to each other through keys. Queries are expressed in SQL, a language that is approachable enough for analysts and powerful enough for production systems.

Note: SQL will appear in technical interviews, so this is one to review!

ℹ

The Four Operations: CRUD

Every data system is built on four fundamental operations — Create, Read, Update, Delete. In SQL:

-- Read
SELECT first_name, last_name FROM employees WHERE department = 'sales';

-- Create
INSERT INTO employees (first_name, last_name, department, salary)
VALUES ('Jordan', 'Lee', 'marketing', 72000);

-- Update
UPDATE employees SET salary = salary * 1.1 WHERE department = 'sales';

-- Delete
DELETE FROM employees WHERE department = 'HR';

The operation that trips up most people is JOIN — combining rows from two or more tables based on a related column. The mental model is set theory, but you don't need to think in sets to use joins fluently. Think instead about what rows you want in your result.

The Four SQL Join Types

Intersection Only

INNER JOIN

Returns only rows where a match exists in both tables.

If a row in the left table has no corresponding row in the right table (or vice versa), it's excluded from the result. Silent drops are the most common source of analytics bugs.

SELECT * FROM a
INNER JOIN b ON a.id = b.a_id

When to use

Use when every row must match

All Left Rows

LEFT JOIN

Returns every row from the left table, plus matched rows from the right.

When no match exists in the right table, its columns appear as NULL. The most-used join in analytics — "give me everything from A, and whatever B has on it."

SELECT * FROM a
LEFT JOIN b ON a.id = b.a_id

When to use

Use for counts that include zero

All Right Rows

RIGHT JOIN

Returns every row from the right table, plus matched rows from the left.

Mirror of LEFT JOIN. Rarely used in practice — most engineers swap the table order and write a LEFT JOIN instead. Same result, more readable.

SELECT * FROM a
RIGHT JOIN b ON a.id = b.a_id

When to use

Usually replaced by LEFT JOIN

All Rows, Both Sides

FULL OUTER JOIN

Returns all rows from both tables, matched where possible.

Unmatched rows from either side get NULLs on the other side. Best for finding discrepancies — rows that exist in one dataset but not the other.

SELECT * FROM a
FULL OUTER JOIN b ON a.id = b.a_id

When to use

Use for reconciliation queries

When in doubt, start with LEFT JOIN. It preserves all rows from the primary table and makes missing matches visible as NULLs — silent row drops are the most common source of analytics errors.

SQL Join Sandboxcustomers (5 rows) × orders (6 rows)

Select a join type to see which rows appear:

SELECT c.id, c.name, c.city,
       o.id AS order_id, o.amount, o.product
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

Result5 rowsINNER JOIN

c.id	c.name	c.city	order_id	amount	product
1	Alice Chen	Durham	101	149.99	Laptop Stand
1	Alice Chen	Durham	102	29.99	USB Hub
2	Ben Mora	Raleigh	103	399	Monitor
3	Carmen Li	Chapel Hill	104	89.5	Keyboard
3	Carmen Li	Chapel Hill	105	55	Mouse

Matched rowNULL (no match)Eva Torres has no orders — watch her appear/disappear

Write SQL queries against a live in-browser database. Try each join type and watch which rows appear and disappear.

✦

When Relational Databases Fit (and When They Don't)

Relational databases excel at: structured data with a consistent schema, complex queries with multiple joins, transactional workloads where consistency matters (financial records, order systems), and moderate data volumes.

They start to strain at: schema-flexible documents where every row looks different, extremely high write throughput, data too large to fit on a single machine without careful partitioning, and highly connected data (social graphs, fraud networks) where joins across many tables become expensive.

Checkpoint

You want a list of all customers and how many orders each has placed — including customers who have placed zero orders. Which SQL approach returns the correct result?

←PreviousWhat Kind of Data?Data Storage Next→NoSQL: When Relational Stops FittingData Storage