Tech Stack & Tooling

Detail the technologies, services, and infrastructure used to deliver this project. Keep it current as architecture evolves.

Languages & Frameworks

Python 3.11+ – primary language for data processing and modelling.
(Optional) R, SQL, Scala – document if applicable.
Visualization – Matplotlib, Seaborn, Plotly (list whichever you adopt).

Core Libraries

Data manipulation: pandas, NumPy.
Machine learning: scikit-learn, XGBoost, LightGBM, TensorFlow/PyTorch (specify).
Feature engineering: category encoders, pipeline utilities.
Evaluation: scikit-learn metrics, statsmodels.

Infrastructure & Services

Data storage: S3 buckets, Azure Blob, GCS, PostgreSQL, etc.
Compute: local dev machines, cloud instances, Databricks, Spark clusters.
Workflow orchestration: Airflow, Prefect, Dagster.
Experiment tracking: MLflow, Weights & Biases.
Monitoring: custom dashboards, logging sinks.

Dev Tooling

Poetry for dependency management and packaging.
Ruff + mypy for linting, formatting, typing.
Tox for environment orchestration.
Pytest + pytest-cov for testing/coverage.
Loguru for logging.
Pre-commit hooks for consistent code quality.

DevOps & CI/CD

Continuous integration (e.g., GitHub Actions) – describe jobs/pipelines if configured.
Containerisation (Docker) – note base images, build process.
Deployment targets (API service, batch jobs, dashboards).

Access & Security

Secrets management (environment variables, vaults, AWS Secrets Manager).
IAM roles/permissions needed for cloud resources.
Audit logging and compliance requirements.

Hardware Requirements

Local development specs (RAM, CPU, GPU if required).
Production environment specs.

References

Links to architecture diagrams, ADRs (Architecture Decision Records), or runbooks.
Vendor documentation or internal wiki pages.