Cross-Validation

Cross-validation is a more data-efficient approach to robust evaluation — especially important with small datasets where you can't afford to hold out 20% just for validation.

K-fold cross-validation splits your data into k equally-sized folds. For each fold, use that fold as the test set and the remaining k−1 folds for training. Train and evaluate k separate times, then aggregate the metrics. Common values: k = 5 or 10.

Stratified k-fold applies the same idea but ensures each fold's class distribution matches the overall dataset — critical for imbalanced classification where a random split might leave one fold with no minority-class examples at all.

Leave-one-out (LOOCV) sets k equal to the number of samples — each "fold" is a single observation. Extremely conservative and robust, but also extremely computationally expensive. Use when your dataset is small and you can afford the compute.

K-Fold Cross-Validation

k =

Fold 1

Fold 2

Fold 3

Fold 4

Fold 5

Iteration 1: Fold 1 is held out for testing. The model trains on the other 4 folds. After all 5 iterations, every sample has been in the test set exactly once.

5iterations

20%test per iter

1 / 5

Click any iteration row or use prev/next to step through each fold assignment. Switch between K-Fold, Stratified K-Fold, and LOOCV.

⚠

Cross-Validation Doesn't Replace a Held-Out Test Set

Even with cross-validation, hold out an external test set if you can. Cross-validation gives you a robust estimate of generalization during model development. A final held-out test set gives you an honest, untouched evaluation at the end. Use both when possible.

✦

When to Use Which

Large dataset, plentiful compute → simple train/validation/test split is most common in industry.
Small dataset → k-fold or LOOCV extracts the most signal from limited data.
Imbalanced classes → stratified k-fold is essential to ensure minority classes appear in every fold.

Checkpoint

You're classifying medical records where only 3% of cases are positive (the condition you're trying to detect). You use standard k-fold cross-validation with k=10. What's the risk?

←PreviousWhy Three Splits, Not TwoPreparing Data for Modeling Next→Data Leakage: The Silent KillerPreparing Data for Modeling