From Notebook to the World
Most ML projects that fail don't fail because the model was bad. They fail because nobody built the operations layer, or because the model made it to production and quietly broke — and nobody was watching. A model without an operations layer is a project. A model with one is a product.
The MLOps Lifecycle
Phase 1
01
ML Design
Requirements, use case prioritization, data acquisition, problem framing.
This is where the question gets defined. A well-scoped problem statement with clear success criteria is the single biggest predictor of whether a model ever ships.
Key question
What are we building and why?
Phase 2
02
Model Development
Data prep, feature engineering, training, experimentation, evaluation.
The part most modelers spend their time on. Features are engineered, experiments are tracked, and models are evaluated until one is good enough to ship.
Key question
Does the model actually work?
Phase 3
03
Operations
Deployment, CI/CD, monitoring, triggered retraining.
The model runs forever; the training run happened once. This phase is where most production engineering time goes — and where models quietly break without anyone noticing.
Key question
Is it still working in prod?
In industry, Phase 3 consumes more engineering time than Phases 1 and 2 combined — because a production system runs forever, and the model was trained once.
Checkpoint
Six months after deploying a fraud detection model, the ops team reports that the false positive rate has tripled — the model is incorrectly flagging three times as many legitimate transactions as fraudulent. The model hasn't been retrained. What is the most likely explanation?