Data Organization
Data Infrastructure Wizard1 / 5
1
2
3
4
5
Schema Design
You're building an intake form for a small clinic. Nurses will record patient temperatures throughout the day.
You're collecting patient temperature readings. How do you define the temperature field?
Work through five infrastructure decisions — schema, naming, protocols, storage, and documentation — before the first row of data arrives.
Set up your data structure before you start collecting:
- Schema design — what fields, what types, what constraints.
- Naming conventions and formatting standards — consistent from day one.
- Collection protocols — how data enters, what validation runs on ingestion.
- Storage and version control — logical directory structure, versioned.
- Documentation — what each field means, where data came from, what's been cleaned.
There's no single agreed-upon documentation standard, but several frameworks exist: Datasheets for Datasets (Gebru et al., 2018), Dataset Nutrition Labels, and Data Statements for NLP. Pick one that fits your project and use it consistently.