Sensor Data

Sensor data is a special case of time series. It's encoded as numbers, it's inherently sequential, and the ordering carries information. If you call your sample at time n by the name x, then xn−1 came before it and xn+1 comes after. Order matters; you can't reshuffle.

Two things that aren't obvious if you've never worked with sensors before:

The Timestamps Aren't in the File

Most of the time, when you receive sensor data, the timestamps aren't included row by row. Instead, you infer them from the sensor's sample rate, which lives in the sensor's data sheet. Become very familiar with the data sheets of the sensors you work with. They tell you what the numbers actually mean, what the sample rate is, and what filtering has been applied. Without the data sheet, you're flying blind.

Downsample Yes — Upsample With Extreme Care

Downsampling means removing samples (100 Hz → 10 Hz). You're throwing data away, but you're not introducing anything new.

Upsampling means adding samples — filling in values where there weren't any. Whatever method you use (interpolation, repetition, smoothing) introduces assumptions that may not be true. With sensor data, you can introduce bias that distorts every model trained downstream.

Default rule: downsample yes, upsample very carefully.

Downsampling vs. Upsampling

Both plots show the same ground-truth signal (dashed). Downsampling only removes real measurements. Upsampling invents values that were never recorded — and the method used encodes assumptions that may not be true.

Sample rate:

Downsampling

100 Hz signal → 10 Hz (every 10th sample kept)

true signalreconstructed

Lower rates lose high-frequency detail but introduce no new information. Every dot is a real measurement.

Upsample config:

Upsampling (linear interpolation)

10 Hz → 20 Hz — fabricated points shown in orange

fabricates data
true signalreconstructed

The reconstructed curve (orange) diverges from the true signal (dashed) because linear interpolation assumes values change linearly between measurements — an assumption the real world rarely satisfies.

Downsampling removes real samples but introduces nothing new. Upsampling fabricates values — and whatever method you use encodes assumptions that may not be true.

Checkpoint

A sensor dataset was collected at 50 Hz. A teammate proposes upsampling to 200 Hz to match another sensor. What is the core risk?

You Are Rarely Getting Raw Data

A heart-rate sensor in a smartwatch does not give you "heart rate." It gives you the output of a multi-stage signal processing pipeline running on the device. The actual raw signal — photoplethysmography (PPG) — has been filtered, peak-detected, smoothed, compressed, and translated into beats per minute before it ever reaches your data file. If you want to build sensor-data models that work, you have to understand what's happening upstream of the file you're reading.

Research Example: Non-Invasive Glucose Detection

During my PhD, I investigated sources of inaccuracy in wearable heart-rate sensors and engineered digital biomarkers of interstitial glucose from smartwatch data — the goal being non-invasive glucose detection instead of a finger prick. To do that, I had to understand the PPG signal at the hardware level, not just the aggregated "heart rate per minute" level. The patterns we cared about lived in the raw signal.