Schema contracts → validators → immutable snapshots → deterministic factor materialization
A compact, production-style reference project for building reliable market data pipelines: explicit schema contracts, composable QA validators with structured reports, checksummed snapshots, and deterministic feature materialization (factor store style).
Repo: H2nryHe/Market-Data-QA---Factor-Store
Run the end-to-end smoke pipeline (validate → snapshot → verify → materialize):
bash ci/sample_pipeline.sh
Expected artifacts (paths):
data/qa/validation_report_pipeline.json
data/snapshots/market_ohlcv/<snapshot_id>/manifest.json
data/features/market_ohlcv/<cache_key>/features.parquet
data/features/market_ohlcv/<cache_key>/feature_manifest.json
CSV sample/raw input
|
v
[schemas/*] contract checks (columns/dtypes/rules)
|
v
[validators/*] structural + duplicates + temporal + outliers
| (JSON report, PASS/WARN/FAIL, non-zero exit on FAIL)
v
[versioning/*] snapshot -> data.parquet + manifest.json + checksums
|
v
[features/*] factor materialization from snapshot only
| (deterministic sort + cache key from checksum/config/version)
v
features.parquet + feature_manifest.json
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e '.[dev]'
ruff check .
black --check .
pytest -q
Tip: publish this page via GitHub Pages (Settings → Pages → Deploy from branch → main /docs).