Data & Experimentation

Use this page to describe the datasets, their provenance, data quality considerations, and how exploratory analysis is conducted.

Data Inventory

Dataset	Source	Location	Notes
Example Raw Dataset	Internal CRM export	`data/raw/example.csv`	Daily extract; contains PII (mask or anonymize)
Processed Features	Derived via `clean_dataframe`	`data/processed/features.parquet`	Updated after each ETL run
External Reference	Public API	`data/external/reference.json`	Refresh monthly

Keep this table current as new datasets are added or existing ones change.

Acquisition scripts: document where code lives (e.g., scripts/, external ETL).
Refresh cadence: daily/weekly/monthly? Automate via cron, Airflow, etc.
Access requirements: credentials, VPN, API tokens. Store secrets securely (never commit).

Describe how raw data is pulled. Include sample queries, API endpoints,
and expected schemas. Note any governance requirements or limits.

Reference helper functions/modules (load_csv, clean_dataframe, select_columns).
Describe feature engineering steps, transformations, encodings, scaling.
Maintain versioned transformation logic (config files, code branches).