Hardware and Sensors

Getting data directly from hardware and sensors means you own the collection rig. Examples of public sensor datasets include EKG heartbeat categorization datasets, environmental sensor telemetry, the MotionSense dataset (smartphone sensors for human activity recognition), solar power generation data, vehicular sensor datasets, and nurse stress prediction datasets from wearable sensors.

Advantages

Sensors can capture things no survey, no scrape, and no API can — physiological signals, environmental conditions, motion, structural strain. If you need data that doesn't yet exist in the world, sensors are how you make it exist.

Limitations

  • Cost. Hardware is expensive to deploy and maintain. Sensors need calibration. Devices fail.
  • Heterogeneity. Different sensors use different formats, sample rates, and protocols. Stitching them together is real engineering work.
  • Preprocessing happens upstream. You are almost never receiving truly raw sensor data. A "heart rate" in a smartwatch data file is the output of an on-device signal processing chain. Understand what's happening upstream of the file you're reading.

Building the Rig When the Hardware Doesn't Exist Yet

One of my favorite approaches: if you need sensor data and the hardware doesn't exist yet, build the rig and collect it yourself. I once worked on a project where the production hardware wasn't ready, so we strapped our own IMU sensors to the ends of dumbbells, collected a labeled dataset ourselves, and built the algorithms on that. It was faster than waiting for the hardware.

Checkpoint

You receive a CSV of accelerometer data from a wearable. The file has no timestamp column — just rows of x, y, z values. How do you recover the timing of each sample?