Privacy and Regulations
Data privacy is the right of users to have control over how their information is collected, used, and shared. Understanding it is not optional for anyone building data products — both because it's ethically important and because it's legally required.
Case 1: Location Data — 'Anonymized' Isn't Anonymous
At least 75 companies receive precise, "anonymized" location data from apps whose users enabled location services for weather, news, or similar purposes. They sell it to advertisers, retailers, and hedge funds. The market is in the tens of billions of dollars per year.
The New York Times investigation tracked specific individuals via this data. A teacher's device pinged from home to school hundreds of times — they could identify how long she spent at the dermatologist, when she went to the gym, when she visited a Weight Watchers location. The data is "anonymous" only in the sense that names aren't attached to the IDs. Location patterns over time identify people uniquely. Anonymization-by-removing-names is not real anonymization.
Case 2: Target Knew Before Her Father Did
Target's data team identified about 25 products that, when purchased together, indicated a customer was likely pregnant — and even allowed estimation of due dates. Things like unscented lotion in the second trimester, calcium supplements, large quantities of cotton balls. They sent targeted coupons. A teenager started receiving baby coupons in the mail. Her father confronted Target. Then he found out his daughter was pregnant — she hadn't told him yet.
These are not hypothetical. These are normal applications of normal techniques to normal data. The question isn't whether your data work has privacy implications. It's whether you've thought about them.
Key US Privacy Regulations:
- HIPAA — governs protected health information (PHI) used by covered entities. Requires privacy notices, limits use to treatment/payment/operations without additional consent, gives users right of access.
- FERPA (1974) — gives students control over disclosure of educational records. Applies to all educational institutions receiving federal funding.
- FCRA / FACTA / GLBA — govern financial data. FCRA limits consumer credit report use and requires notification of adverse decisions. GLBA introduced privacy notices and security program requirements for financial institutions.
- CCPA — California's Consumer Privacy Act, the strictest state-level privacy law in the US. Applies broadly to businesses that serve California residents, regardless of where the business is located.
If your organization offers services in a country — even free services — or processes data of users who live there, you must follow that country's privacy laws regardless of where you're physically located.
PII vs. Sensitive Information
PII (Personally Identifiable Information) — non-public information tied to or identifiable to a specific person: names, SSNs, dates of birth, addresses, phone numbers, email addresses, biometric data, IP addresses, financial account numbers.
Sensitive information — a subset with stricter rules: Social Security numbers, financial information, medical records.
PII can be directly identifiable (name) or indirectly identifiable (a combination of attributes that allow re-identification — which is what the location data case demonstrates). "Anonymized" data can still be PII if re-identification is feasible.
You publish a dataset of 'anonymized' user location pings, removing names and user IDs. A researcher later shows that 95% of individuals in the dataset can be uniquely identified from just 4 random location samples. Does this dataset contain PII?

