Feature Engineering for Images

Each pixel has an RGB value from 0 to 255 — three channels of numbers per pixel. Those raw pixel values can be input features directly, but often you can engineer something richer, more compact, and more robust to variation.

Hand-engineered image features:

Color histograms. Distribution of pixel intensities per color channel. Simple and compact, but sensitive to lighting and ignores spatial structure.
Texture descriptors. Local Binary Patterns (LBP), Gabor filters, Gray-Level Co-occurrence Matrix (GLCM). Robust to illumination changes, capture spatial structure. Parameter selection is tricky.
Edge detection. Sobel, Prewitt, Canny (Canny is most popular). Identifies sharp intensity discontinuities — typically object boundaries. The early layers of CNNs often learn edge-detector-like filters spontaneously, which tells you these are genuinely the right low-level features for many visual tasks.

Deep learning–based feature extraction: Use pre-trained CNNs (ResNet, VGG, EfficientNet) as feature extractors, often with transfer learning and fine-tuning. Features from the penultimate layer of a pre-trained model are usually more discriminative than anything you'd engineer by hand. This is the dominant modern approach.

Color-Based Features

The most intuitive starting point: color. Different objects tend to have different color distributions, and we can capture those distributions statistically.

1 of 4

Color Histograms

Calculate how often each color value appears across different color spaces (RGB, HSV, LAB). A forest will have a lot of green. A beach will have a lot of blue and tan.

Texture-Based Features

Texture captures how the surface of an object looks locally — rough, smooth, striped, dotted.

1 of 4

Gray Level Co-occurrence Matrix (GLCM)

Captures how often pairs of pixels with specific intensity values appear adjacent to each other. Encodes the spatial structure of intensity patterns.

Statistical Features

Sometimes the simplest descriptors are the most robust. Basic statistical summaries of pixel intensities — mean, variance, entropy — capture global image properties without any geometric assumptions.

1 of 4

Standard Statistics

Mean and variance of pixel intensities give a quick summary of brightness and contrast.

Shape-Based Features

Shape features describe the boundaries and structure of objects rather than their color or texture.

1 of 4

Edge Detection (Sobel, Canny, Prewitt)

Identify boundaries in images based on intensity gradients. These operators highlight where the image changes rapidly — the outlines of objects.

ℹ

Data Augmentation

A specialized form of feature engineering for images. Apply transforms to your training data only (never to validation or test):

Random crop, rotation, horizontal flip
Brightness adjustment, color jitter
Gaussian blur, cutout

In PyTorch, augmentation transforms apply at iteration time, so each epoch sees a slightly different version of each image. This makes the model more generalizable without collecting more data.

⚠

Never Augment Your Validation or Test Set

Augmentation is a training-time technique. Augmenting your validation or test set introduces artificial variation into your evaluation metric and makes results unreproducible. The test set should represent real-world data as-is — no flipping, no color jitter, no random crops.

Checkpoint

You're classifying medical X-rays. You apply horizontal flipping as a data augmentation technique during training. Is this appropriate?

←PreviousFeature Engineering for Time SeriesFeature Engineering Next→Feature Engineering for TextFeature Engineering