Skip to main content

Performance notes

Where MTS1B is fast, where it isn't, and how to tune it.

Latency budget (live trading hot path)

End-to-end target: strategy signal → broker submission ≤ 100 ms p99.

StepTargetWhy
Strategy signal generation≤ 10 msfactor compute (cached features)
Portfolio sizing≤ 5 msquantkit kelly/vol-target
OMS receive + idempotency check≤ 5 msdedupe lookup
Risk gate 1 (idempotency)≤ 1 mshashed dedupe
Risk gate 2 (schema)≤ 1 mspydantic validate
Risk gate 3 (static)≤ 2 msin-memory lookups
Risk gate 4 (position)≤ 5 msposition store + cov compute
Risk gate 5 (drawdown)≤ 2 msNAV diff
Risk gate 6 (short)≤ 2 msborrow cache
Risk gate 7 (CRO veto, optional)≤ 5 sLLM call (fail-OPEN)
Broker submit≤ 20 msnetwork to venue
Total≤ 100 ms p99

Without the CRO veto, hot path is ≤ 50 ms p99.

Backtest throughput

GPU vs CPU comparison (Russell 1000 × 10 years daily)

BackendUniverse sizePeriodWall timeSpeedup
CPU (numpy)10010 yr8 sec1x
CPU (numpy)100010 yr95 sec1x
CPU (numpy)100010 yr (1m bars)18 min1x
GPU (cupy RTX 4090)10010 yr1.2 sec~7x
GPU (cupy RTX 4090)100010 yr4.8 sec~20x
GPU (cupy RTX 4090)100010 yr (1m bars)38 sec~28x
GPU (cupy H100)100010 yr (1m bars)14 sec~77x

For large parameter sweeps (ladder), the GPU advantage compounds:

  • 100k combos × 1000 universe × 10 yr daily:
    • CPU: ~10 hours
    • GPU (RTX 4090): ~30 minutes
    • GPU (H100): ~10 minutes

When CPU is fine

  • Universe < 100 symbols
  • Period < 5 years daily
  • One-off run (no sweep)
  • Development on a laptop without CUDA

When GPU pays off

  • Universe > 200 symbols
  • Intraday bars (1m / 5m)
  • Parameter sweeps (anything > 1k combinations)
  • Walk-forward CV (multi-fold runs)

Memory

Foundation library

Tiny — under 10 MB resident.

Platform primitives

PrimitiveResident
Logging setup~5 MB
Config + Vault client~15 MB
HTTP client (httpx)~10 MB
NATS client~5 MB
Postgres pool~30 MB
Redis client~5 MB

A typical service

ServiceIdleActive
mts1b-foundation (library)10 MB10 MB
mts1b-platform (library)50 MB80 MB
mts1b-marketdata (service)80 MB200 MB
mts1b-oms (service)100 MB300 MB
mts1b-riskengine (service)80 MB150 MB
mts1b-research (service)200 MB1-4 GB (active sweep)
mts1b-GPUbacktester (service)100 MB host8-24 GB GPU (active backtest)
mts1b-datalake (service)150 MB500 MB-2 GB (active ingest)

Per-service tuning lives in mts1b.config under each section.

Concurrency

Async first

Every I/O-bound function in MTS1B is async. Don't mix sync (blocking) HTTP calls.

❌ Wrong:

import requests
response = requests.get("https://api.example.com") # blocks the event loop

✅ Right:

from mts1b_platform.http import http_client
async with http_client("example") as c:
response = await c.get("https://api.example.com")

Multi-process (CPU-bound work)

Async helps with I/O, not CPU. For CPU-heavy work (factor compute, optimization):

# Use ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor(max_workers=8) as ex:
results = list(ex.map(run_one_param_set, param_grid))

Or use mts1b-cloudburst to fan out to rented GPU instances.

NATS consumer concurrency

sub = await js.subscribe(
"mts.v1.oms.fills.created",
durable="my-consumer",
max_ack_pending=100, # process up to 100 in flight
)

Multiple consumer instances with the same durable name share work — like a queue.

Disk I/O

Parquet partitioning

The data lake is partitioned by year/month + symbol/interval. Queries that filter on partition columns are fast:

# Fast — uses partition pruning
df = lake.equities.bars.read(
symbols=["AAPL"],
interval="daily",
start="2024-01-01", end="2024-06-01",
)

# Slow — full scan
df = lake.equities.bars.read(
symbols=["AAPL"],
interval="daily", # all dates
)

Compression

TypeCompressionSpeedSize reduction vs CSV
Daily barssnappyfast5x
Intraday barszstd:3medium12x
News textzstd:3medium8x
Options chainssnappyfast4x

Reading parquet is 5-50x faster than CSV due to columnar layout.

DuckDB for ad-hoc

with lake.duckdb_session() as conn:
df = conn.execute("""
SELECT symbol, AVG(close) AS avg_close
FROM equities.bars
WHERE ts BETWEEN '2024-01-01' AND '2024-06-01'
GROUP BY symbol
""").pl()

DuckDB has predicate pushdown into parquet — only relevant rows are read.

Network

Rate limits

Each adapter respects venue rate limits via mts1b_platform.ratelimit.RateLimiter. Limits shared across processes via Redis.

ProviderFree tierPaid
FMP250/day30k+/day
Polygon5/minunlimited
Coinbase Advanced30 req/sec100+/sec
IBKR Gateway50 req/secunlimited

Hitting a rate limit triggers exponential backoff via mts1b_platform.retry.

Connection pooling

HTTP client factory keeps connections alive:

async with http_client("polygon", base_url="https://api.polygon.io",
max_connections=20, max_keepalive_connections=10) as c:
# All requests share the pool
results = await asyncio.gather(*[c.get(f"/v2/last/trade/{s}") for s in symbols])

NATS publish latency

Local cluster: < 1 ms p99 for publish-ack. Cross-region: 10-100 ms. JetStream durable adds ~1-3 ms for disk persistence.

Database

Postgres pool size

Default is 20 connections. For a service handling > 100 concurrent requests, increase:

db:
primary:
dsn: postgres://...
pool_size: 50
max_overflow: 20

Monitor:

postgres_pool_active{pool="primary"}
postgres_pool_idle{pool="primary"}
postgres_pool_waiting{pool="primary"}

If waiting > 0 consistently, increase the pool. If idle = pool_size consistently, decrease it.

Indexes

Critical indexes (auto-created):

  • orders(order_id) PK
  • orders(idempotency_key, created_at) unique within dedup window
  • orders(fund_id, created_at) for fund views
  • fills(order_id)
  • positions(fund_id, symbol) PK
  • audit_chain(sequence) PK
  • audit_chain(subject_id, timestamp) for "show order trail"

Verify:

SELECT indexname FROM pg_indexes WHERE schemaname = 'public' ORDER BY indexname;

LLM cost

PersonaCalls/dayAvg tokensCost/day
CRO (gate 7)~500800 in / 200 out~$3
equities_analyst~502000 in / 500 out~$1.50
news_summarizer~205000 in / 800 out~$2
quant_screener~103000 in / 1500 out~$3
...
TOTAL~700~$15-25/day

Semantic cache hits typically reduce cost by 60-80% on stable workloads.

Where to look for slowness

mts mts1b-platform tail --slow-only --threshold-ms 50
# Streams any operation > 50ms across all services
mts mts1b-platform metric --top 10 --since 1h
# Top 10 slowest operations in the last hour
mts1b-deploy open grafana
# Browse dashboards → Service Overview

Tuning checklist

  • All HTTP calls use mts1b_platform.http.http_client (pooled + retried)
  • All Postgres queries go through mts1b_platform.db.get_pool (pooled)
  • All NATS publish uses mts1b_platform.eventbus.publish_typed (typed + traced)
  • LLM calls bounded by daily budget
  • Backtest uses GPU when universe > 100
  • Parquet queries filter on partition columns
  • Watchdog alerts wired for: drift, vpin, slow consumer, dependency, db health

See also