mts1b-GPUbacktester
Pure CUDA backtest compute engine. Kernels, factors, walk-forward CV. No orchestration.
Repo: github.com/MTS1B/mts1b-GPUbacktester
Layer: 4
Depends on: foundation, platform, quantkit, cupy (optional)
Audience: mts1b-research (primary), CLI users, plugin authors
What it is
A CUDA-accelerated backtest engine. Reads parquet factor data + price feeds; writes result parquets. Standalone — no orchestration, no agents, no live trading.
10-100x speedup over numpy for large universes (1000+ symbols × 5+ years of daily bars).
What it is NOT
- ❌ Not a strategy-discovery workflow (that's
mts1b-research) - ❌ Not a live trader (that's
mts1b-oms+mts1b-brokers) - ❌ Not a portfolio sizer (that's
mts1b-portfolio) - ❌ Not a duplicate of
mts1b-quantkit(compute engine vs library)
Specifically: ladder-sweep orchestration stays in mts1b-research, NOT here. This repo is the pure engine. Ladder is a research workflow that orchestrates this engine.
Module layout
mts1b_GPUbacktester/
├── core/
│ ├── engine.py # main backtest loop
│ ├── data.py # parquet → cupy arrays
│ ├── memory.py # GPU memory management
│ └── result.py # BacktestResult schema
├── kernels/
│ ├── sharpe.py # sharpe_from_moments (CUDA)
│ ├── walkforward.py
│ ├── portfolio_returns.py
│ └── metrics.py # calmar, max_dd, IC, t-stat
├── factors/
│ ├── crypto_factors.py
│ ├── fx_factors.py
│ ├── equity_factors.py
│ ├── tail_risk.py
│ └── calendar_effects.py
├── eval/
│ └── statistical.py # validators (clones of quantkit but GPU)
├── lifecycle/
│ ├── validator.py # champion vs challenger
│ └── shadow.py
├── execution/
│ └── cost_calibration.py # backtest-side cost calibration
├── analytics/
│ └── bootstrap.py
└── cli/
└── batch.py # CLI entrypoint
CLI
mts1b-backtest run \
--factor f_crypto_realized_vol \
--params '{"h": 21}' \
--universe crypto-top-10 \
--start 2022-01-01 --end 2026-01-01 \
--rebal weekly \
--cost-bps 60 \
--sizing equal_weight_ls \
--n-long 2 --n-short 2 \
--output data/backtests/run-1.parquet
mts1b-backtest walk-forward \
--factor f_crypto_realized_vol \
--params-grid '{"h": [10, 21, 42, 63]}' \
--universe crypto-top-10 \
--start 2020-01-01 --end 2026-01-01 \
--train-window 252 --test-window 63 --step 63
mts1b-backtest batch \
--config configs/sweep.yaml # batch many configs
Programmatic
from mts1b_GPUbacktester import run_single, run_walk_forward
from mts1b_quantkit.factors import get
result = run_single(
factor=get("f_crypto_realized_vol"),
params={"h": 21},
universe="crypto-top-10",
start="2022-01-01",
end="2026-01-01",
rebal="weekly",
sizing={"method": "equal_weight_ls", "n_long": 2, "n_short": 2, "gross": 1.0},
cost_bps=60,
invert=True,
)
print(f"Sharpe: {result.sharpe:.2f}")
print(f"Max DD: {result.max_drawdown:.2%}")
# Programmatic walk-forward
cv = run_walk_forward(
factor=get("f_crypto_realized_vol"),
params_grid={"h": [10, 21, 42, 63]},
universe="crypto-top-10",
start="2020-01-01", end="2026-01-01",
train_window=252, test_window=63, step=63,
)
BacktestResult schema
class BacktestResult(BaseModel):
config: BacktestConfig
# Returns
returns: np.ndarray # (T,) daily strategy returns
cum_returns: np.ndarray # (T,) cumulative
equity_curve: np.ndarray # (T,) NAV path
# Positions
weights: np.ndarray # (T, A) target weights
holdings: np.ndarray # (T, A) actual holdings
turnover: np.ndarray # (T,) one-way turnover
# Costs
fees: np.ndarray # (T,) per-day fees
slippage: np.ndarray # (T,) per-day slippage
# Metrics (annualized)
sharpe: float
calmar: float
max_drawdown: float
cagr: float
ic: float
t_stat: float
turnover_annualized: float
# Walk-forward
fold_sharpes: list[float] | None
ci95_sharpe: tuple[float, float] | None
Serializable to parquet for downstream tooling.
Backend selection
# CUDA (default if available)
mts1b-backtest run --backend cuda --factor f_my_factor ...
# CPU fallback (~10-100x slower for large universes)
mts1b-backtest run --backend cpu --factor f_my_factor ...
Most factor implementations are CPU/GPU-agnostic via xp dispatch — same code, different backend.
Memory management
For very large universes (10k+ symbols × 10+ years of intraday bars), the engine streams data in chunks:
result = run_single(
factor=...,
universe="russell-3000",
start="2014-01-01", end="2024-01-01",
chunk_size_mb=512, # max GPU memory per chunk
)
Tradeoff: smaller chunks = more host↔device transfers = slower. Default 2 GB chunk size works on most consumer GPUs.
Determinism
All randomness goes through a seedable RNG. Same seed + same config = byte-identical results across runs:
result1 = run_single(..., seed=42)
result2 = run_single(..., seed=42)
assert np.array_equal(result1.returns, result2.returns)
CI verifies determinism on every PR.
Boundary verification (CI)
# No ladder code can leak in
$ python -m mts.tools.ast_scan --forbid "ladder" src/
PASS
# No HRP/BL clones (those are in quantkit)
$ python -m mts.tools.ast_scan --forbid "hrp_weights|black_litterman" src/
PASS
The engine USES quantkit for math. It does NOT redefine quantkit functions.
Build + test
pip install -e ".[dev,gpu]" # gpu extras pulls cupy
pytest -m unit # hermetic
pytest -m gpu --gpu # requires CUDA
pytest -m benchmark # perf regression
# Sanity
mts1b-backtest demo # runs a fixture-based smoke test
Roadmap
| Version | Items |
|---|---|
| 0.1 (Wave 1) | Extract from gpuBT1060/** (minus ladder + minus HRP/BL clones), CLI, walk-forward |
| 0.2 (Wave 2) | Intraday bar backtests (currently daily only) |
| 0.3 (Wave 2) | Multi-asset-class portfolio backtests |
| 0.4 (Wave 3) | Distributed multi-GPU |
| 1.0 (LTS) | Stable result schema, stable CLI |
See also
- Concept: Factor system — factor registration + API
- Tutorial: First backtest — end-to-end
mts1b-research— wraps this engine in strategy-discovery workflowsmts1b-quantkit— math primitives consumed here