Labeling Data

Sometimes the data you get is already labeled. Most of the time, it isn't — and someone has to label it. Labeling (also called annotation) is the process of attaching the target variable or other structured information to each example. It's unglamorous, it's expensive, and it's where a lot of projects go wrong.

Even Experts Disagree

If you give a panel of experienced radiologists the same set of images, you will get different diagnoses on the same image. Inter-annotator disagreement is normal, not necessarily a sign that something went wrong. Plan for it from the start with:

  • Clear labeling guidelines written down before you start.
  • Multiple annotators per item with explicit disagreement-resolution protocols.
  • Double-blind annotation where labelers don't see each other's work.
  • Domain experts in the loop for ambiguous cases.

Tools like CVAT (Computer Vision Annotation Tool) and similar platforms can dramatically speed up labeling using a human-in-the-loop pattern: an AI proposes a label, a human reviews and corrects. The human is still essential, but throughput is enormously higher than pure manual labeling. This is what production annotation actually looks like.

Labeling Workflow1 / 5

The product exceeded all my expectations — I use it every day.

AI proposalpositive
Confidence
97%
Your label
PositiveNeutralNegative

Demonstration of the human-in-the-loop annotation workflow — AI proposal → human review → accept/correct → next item.

Labeling Is Often the Largest Line Item

In industry, labeling is often the single largest line item in an ML project budget. Companies like Scale AI exist precisely because labeling is so labor-intensive and quality-critical. If you can make labeling 20% more efficient on a large project, you have produced an enormous amount of value.

There are many platforms for labeling data with varying degrees of customizability. Some common ones include Prolific, Amazon Mechanical Turk, and Scale AI. If you want ownership over the entire process, you can spin up an EC2 instance using the open source Label Studio.

Checkpoint

Two annotators are labeling sentiment in customer reviews. On 30% of examples they disagree (one labels 'positive', the other 'neutral'). What is the best response?