Why Converting Ideas Into Numbers Is Hard

While there are advantages to numerical representation, there are also major gaps. Whenever you convert a concept into a representation, it is a lossy process.

Subjectivity

Your "red" is not my "red." Ideas vary from person to person, and no encoding scheme can paper over that.

Complexity

A photograph of your grandmother contains an enormous amount of information. Every encoding throws some of it away.

Ambiguity

Any encoding requires assumptions. Your assumptions might not match the next person's.

Context Dependence

"Bank" means something different in "river bank" vs. "bank account." Encoding meaning without context is nearly impossible.

Creative and Emotional Aspects

Try writing a number that captures "how moving this song is." Whatever you write, someone will disagree.

These are not solved problems. Every encoding choice you make is a tradeoff between fidelity and tractability. The rest of this chapter is about how to make those tradeoffs sensibly for each major data type you'll encounter.

Why LLMs Still Fail

The reason large language models are so impressive is not that they "understand" language. They don't. What they do is approximate a very high-dimensional numeric representation of language patterns — billions of parameters' worth — that captures enough of the nuance to be useful. The reason these models still occasionally fail in funny or troubling ways is that the underlying problem from this section is unsolved. Subjectivity, ambiguity, and context dependence are still there, just buried under a lot of compute.

Checkpoint

Which of these best explains why encoding 'color' for a machine learning model is genuinely difficult, even though color seems simple?

Checkpoint

Think of a concept that would be especially hard to encode as numbers — something you care about or interact with regularly. What makes it hard? Which of the five difficulties (subjectivity, complexity, ambiguity, context dependence, creative/emotional aspects) applies?