Data Organization

Data Infrastructure Wizard1 / 5
1
2
3
4
5
Schema Design

You're building an intake form for a small clinic. Nurses will record patient temperatures throughout the day.

You're collecting patient temperature readings. How do you define the temperature field?

Work through five infrastructure decisions — schema, naming, protocols, storage, and documentation — before the first row of data arrives.

Set up your data structure before you start collecting:

  • Schema design — what fields, what types, what constraints.
  • Naming conventions and formatting standards — consistent from day one.
  • Collection protocols — how data enters, what validation runs on ingestion.
  • Storage and version control — logical directory structure, versioned.
  • Documentation — what each field means, where data came from, what's been cleaned.

There's no single agreed-upon documentation standard, but several frameworks exist: Datasheets for Datasets (Gebru et al., 2018), Dataset Nutrition Labels, and Data Statements for NLP. Pick one that fits your project and use it consistently.