Feature Engineering

Feature engineering is what happens between raw data and the features that go into your model:

Raw Data → [ Feature Engineering ] → Features → Model

The goal is to extract meaningful information from raw data, reduce dimensionality, and enhance the model's ability to capture patterns. The features are what the model actually sees. If the features are uninformative, no model can fix that. If the features are well-designed, even a simple model can do remarkable things.

It is iterative. It requires experimentation. There's a feedback loop where you try features, see what works, try variations. Sometimes ideas come from intuition. Sometimes from a research paper. Sometimes from a conversation with a domain expert. Sometimes you steal an idea from a completely different domain.

Continuous Glucose Monitoring: Engineering What Doesn't Exist

CGMs are arm-worn sensors that measure interstitial glucose every few minutes. The raw signal is a time series — a sequence of glucose readings. To build a useful model on it, I had to dig through the clinical literature and identify features that diabetes researchers had found informative.

The result is a table of engineered features that didn't exist in the raw signal: inter-day coefficient of variation, high blood glucose index, low blood glucose index, mean of glucose excursions, mean of daily differences. None of these are given by the sensor. Each is a transformation rooted in clinical knowledge of how glucose dynamics matter physiologically. Once you have these features, modeling becomes possible. Without them, you're feeding raw numbers to a model and hoping.

Where Competitions Are Won

In Kaggle competitions, the difference between leaderboard winners and also-rans is almost always feature engineering. The models are largely the same — gradient-boosted trees, neural networks — but the features are different. The same is true in industry. The companies that win on ML often win because they have feature engineering pipelines their competitors haven't thought of.

Feature engineering relies on domain expertise. You are rarely the domain expert — and even if you are, you should seek out others. The best feature engineering work involves a data scientist who understands the math talking to a domain expert who understands what the numbers actually mean.

Three strategies for accessing expertise you don't have:

  • Read. Research papers, textbooks, domain lectures. Build expertise yourself, even if you'll bring in collaborators. It pays off in unexpected ways — you'll ask better questions and spot better opportunities.
  • Talk to domain experts. Learn their vocabulary so you can communicate. Build relationships with people who think differently than you do. Many of the best features I've ever engineered came from a conversation where an expert said "well, we always look at X because of Y" — something I never would have found in the literature.
  • Apply ideas across domains. Signal processing techniques for heartbeat detection from PPG are mathematically similar to anomaly detection in financial time series. Image augmentation from medical imaging transfers to satellite imagery. Cross-domain transfer of feature engineering ideas is wildly underrated.
Checkpoint

A model trained directly on raw sensor accelerometer readings performs poorly. You engineer features like 'mean acceleration over a 2-second window' and 'peak frequency from FFT.' Performance improves dramatically. Why?