User Behavior and Combined Data Types
User behavior data is data about what people do. Clicks. Purchases. Likes. Time spent on a page. A/B test outcomes. It typically comes bundled with metadata — user demographics, device type, timestamps, session identifiers — and it tends to be high-volume and fine-grained. Companies record millions or billions of events.
Electronics
Wireless Noise-Cancelling Headphones
30-hour battery life, adaptive ANC, foldable design. Pairs instantly with up to 3 devices. Premium drivers deliver studio-quality sound across all frequencies.
Rate this item
Interact with the page to generate signals.
Interact with the page to see the different data that can be collected when you interact with a website.
Three Properties That Define User Behavior Data
- Inherently sequential. What someone clicked five minutes ago is part of the context for what they click next.
- Dynamic. Behavior shifts over time, sometimes quickly. A model trained on last month's data may already be stale.
- High volume and fine granularity. The challenge is rarely "do we have enough data." The challenge is "how do we make this tractable."
Combinations of multiple data types is where real-world systems live. One example is the Electronic Health Record (EHR). A single patient's record contains all of the following at once:
- Free-text physician notes (text)
- Medical images — X-rays, MRIs, CT scans (images)
- Sensor data — vital signs, continuous glucose monitors, ECGs (sensor time series)
- Survey-style structured fields — intake forms, demographics (tabular)
- Lab results at irregular intervals (sparse time series)
Three Challenges of Combined Data
- Inconsistency. Different entries follow different conventions. A blood pressure reading might be
"120/80"in one record and{'{'}systolic: 120, diastolic: 80{'}'}in another. - Organization. You can structure this data by date, by individual, by visit, by encounter type. None of these is obviously right.
- "Missing" data is everywhere — and most of it isn't really missing. A null in "MRI taken" means the patient didn't need an MRI, not that the data is absent. Your code might not know that.
Self-Driving Cars: Multi-Modal Fusion
A single autonomous vehicle is simultaneously producing camera images, LIDAR point clouds (3D sensor data), IMU readings (motion sensors), map data (text and structured), and GPS (time series). All of it has to be fused in real time to make a single decision: brake, accelerate, turn, lane-change. The hard part of self-driving isn't any one modality — it's the fusion.
An EHR record has a null value in the 'MRI result' column for a patient. What is most likely true?
Look at an app or service you use daily. What data types does it probably collect about you? How might those types be combined to make predictions about your behavior?