Type 1 and Type 2 Errors
Any time you run a hypothesis test, you can be wrong in two distinct ways. Understanding these errors — and which one you care about more — is one of the most important practical skills in this unit.
A Type 1 error is a false positive: you rejected a true null hypothesis. You concluded there was an effect when there wasn't one.
A Type 2 error is a false negative: you failed to reject a false null hypothesis. There really was an effect, and you missed it.

A Mnemonic That Sticks
Imagine you're trying to remember if it's someone's birthday.
- Type 1 error: You say "happy birthday" — and it's not their birthday. (False positive.)
- Type 2 error: You say nothing — and it is their birthday. (False negative.)
Which is worse depends entirely on who the person is. For a colleague you barely know: Type 1 (saying happy birthday incorrectly) is mildly awkward. For your partner: Type 2 (forgetting) could be catastrophic!
Let's anchor this with a more permanent example: you build a machine learning model that detects cancer.
- A Type 1 error means your model tells a patient they have cancer when they don't. They may undergo unnecessary biopsies, treatments, and serious psychological distress.
- A Type 2 error means your model fails to detect cancer that's actually there. The patient doesn't receive treatment, the disease progresses.
If you have to favor one, you'd rather have the false alarm. Missing a real cancer is far worse than triggering a follow-up test. This shapes everything: the decision threshold, the loss function, which metric you optimize for.
This Trade-off Is Everywhere in ML
Fraud detection skews the same direction: a missed fraud (Type 2) is usually worse than a false flag (Type 1) that a human can review. Medical screening tests are deliberately tuned toward Type 1 errors. Content moderation may trade off differently depending on the platform's values.
Your choice of decision threshold and your primary evaluation metric both encode an implicit answer to the Type 1 / Type 2 trade-off. Make that choice deliberately — don't let it happen by default!
You're building a model to detect critical equipment failures in a factory. A missed failure (no alert when failure is imminent) could cause a catastrophic accident. A false alarm (alert when no failure is coming) causes a brief, costly shutdown. Which error type should you minimize, and what does that imply about your threshold?