Skip to main content

Troubleshooting

Common problems + fixes.

Installation

pip install mts1b-foundation returns "package not found"

Packages are not on PyPI yet (private GitHub repos). Use local editable install:

git clone https://github.com/MTS1B/mts1b-foundation
pip install -e ./mts1b-foundation

For multiple packages, install foundation first then add others without auto-resolving:

pip install -e ./mts1b-foundation
pip install -e ./mts1b-quantkit --no-deps
pip install numpy scipy pandas # add the other deps manually

pydantic.errors.PydanticSchemaGenerationError: Unable to generate schema for Symbol

You're using an older pydantic version. Upgrade:

pip install -U "pydantic>=2.0"

Symbol requires pydantic v2's GetCoreSchemaHandler interface.

ImportError: cannot import name 'X' from 'mts1b_platform'

This usually means you're hitting the bridge-stage. Some submodules still reference /apps/MTS1B/ internals. Two paths:

  1. Use foundation directly for the type you need (most types are there): from mts1b_foundation import X
  2. Check the central docs: https://docs.mts1b.investmentparadisellc.com/docs/repos/platform — the README lists what's currently working

Dependency resolution loop / clash

ERROR: ResolutionImpossible: ...

Pin foundation explicitly and use --no-deps for downstream:

pip install -e mts1b-foundation
pip install -e mts1b-quantkit --no-deps
pip install pandas numpy scipy

Imports

ModuleNotFoundError: No module named 'mts1b_foundation'

python -c "import sys; print('\n'.join(sys.path))"

If your install dir isn't on path, activate the venv:

source .venv/bin/activate

from mts1b_quantkit.portfolio.hrp import hrp_weights fails

In the bridge stage, deep submodule imports often fail because the copied code still has internal from mts.X.Y references that haven't been mapped.

Workaround: import from foundation if available, OR use the live monorepo path:

# Bridge workaround until extraction finishes
import sys
sys.path.insert(0, "/apps/MTS1B/services/research/src")
from gpuBT1060.portfolio.hrp import hrp_weights

Track progress: https://docs.mts1b.investmentparadisellc.com/docs/repos/quantkit#status

NATS

nats: connection refused

NATS server isn't running. Start it:

# Local dev (no auth)
docker run -d -p 4222:4222 -p 8222:8222 nats:2-alpine -js -m 8222

# Or via mts1b-deploy
mts1b-deploy install --profile minimal --include nats

Slow consumer / mismatch consumer warnings

WARN: slow consumer durable=my-consumer lag=42s

Your consumer is falling behind the producer. Either:

  1. Increase max_ack_pending:
    await js.subscribe(..., max_ack_pending=10000)
  2. Add a second consumer with the same durable name (JetStream load-balances)
  3. Process messages in batches instead of one-at-a-time

IncompatibleConsumersError at producer startup

mts1b_foundation.nats.IncompatibleConsumersError: no common version across 3 consumers

A consumer is pinned to an older schema version than the producer's range. Either:

  1. Upgrade the consumer to support a newer schema
  2. Or downgrade the producer's max_v to match

Inspect manifests:

mts1b-platform manifest list

Subject not in registry

WARN: unknown subject mts.v1.my_app.foo.bar — no schema validation

Either register it in mts1b-foundation/nats/_registry.py or use a dict payload (untyped).

Risk gates

ORDER_REJECTED: gate=position_risk code=MAX_POSITION_EXCEEDED

Your order would put the position above RiskEnvelope.max_position_pct. Either:

  • Reduce order quantity
  • Loosen the envelope (NOT during a halt):
    mts mts1b-riskengine envelope set --fund-id X --max-position-pct 0.10

ORDER_REJECTED: gate=drawdown_halt

Fund is in halt state. Cannot loosen envelope while halted. Operator must resume:

mts cmd resume <fund_id>
# Confirms: type RESUME

This requires operator authority + 2FA for live funds.

ORDER_REJECTED: gate=static code=BROKER_NOT_ALLOWED

Order's broker field isn't in RiskEnvelope.allowed_brokers. Check:

mts mts1b-riskengine envelope show --fund-id <fund>

Either change the order's broker, or update the envelope.

OMS state

Order stuck in PENDING_RISK

Check riskengine health:

mts mts1b-deploy status mts1b-riskengine

If unhealthy, restart:

sudo systemctl restart mts1b-riskengine

If healthy but stuck, check the order's audit trail:

mts mts1b-operations audit show --subject-id <order_id>

Order shows ACCEPTED but no fill ever arrives

Broker may have rejected silently. Check:

# Broker connection
mts mts1b-brokers test --broker <broker_name>

# Broker-side open orders (some brokers don't push reject events)
mts mts1b-oms orders open --broker <broker_name>

Run reconciliation:

mts mts1b-riskengine reconcile --fund-id <fund> --force

Backtests

RuntimeError: cupy not installed

mts1b-GPUbacktester defaults to GPU. Install GPU extras:

pip install "mts1b-GPUbacktester[gpu]"

Or force CPU:

mts1b-backtest run --backend cpu --factor ...

Backtest runs but Sharpe is 0.0 / nan

Likely causes:

  1. Universe too small — single asset = no cross-sectional ranking → all weights are 0
  2. Lookback too long — first N bars have NaN; ensure start_date is past the warmup
  3. Cost too high — strategy edge is killed by costs; lower cost_bps to verify the signal is present then add back realistic costs

Debug:

result = run_single(...)
print(result.returns) # daily returns array
print(result.weights[-1]) # latest weights
print(result.config) # check params

Walk-forward IC is much lower than in-sample

This is normal if your factor is overfit. Stop and rebuild — see Tutorial 3 on stability tests.

Deployment

Proxmox API: 401 Unauthorized

Your token doesn't have the right permissions. In Proxmox UI:

  • Datacenter → Permissions → API Tokens
  • Check Privilege Separation is unchecked (or assign explicit perms)
  • Roles: PVEVMAdmin on /vms/* at least

LXC won't start: pct create: storage already exists

Container ID conflict. Use a different ID range:

mts1b-deploy menuconfig
# Proxmox section: Reserve IDs: 200-230

mts1b-deploy install hangs at "Pulling images..."

DNS / registry issue. Try:

docker pull ghcr.io/mts1b/mts1b-foundation:0.0.1

If that fails, you may need to authenticate to ghcr.io:

echo <github_pat> | docker login ghcr.io -u <username> --password-stdin

Or set a local mirror in mts1b.config:

registry:
url: https://my-mirror.local
username: mts1b
password: ${MIRROR_PASSWORD}

Docs site

Algolia search returns no results

Two reasons:

  1. Index is stale — Algolia crawler runs weekly. Re-trigger at https://dashboard.algolia.com → apps/O23N9EQJYS → Crawler → Restart crawl
  2. Page was recently added — wait ~30 min for the next crawl, or trigger manually

docs.mts1b.investmentparadisellc.com returns Site not found

Cloudflare custom domain SSL cert provisioning takes 5-15 min after attach. Workaround:

curl https://mts1b-docs.pages.dev/ # use the .pages.dev URL meanwhile

LLM (mts1b-llm)

BudgetExceededError: persona=CRO daily=$5.00

Budget exhausted. Either:

  1. Wait until UTC midnight (auto-reset)
  2. Increase budget:
    mts mts1b-llm budget set --persona CRO --daily-usd 10

RateLimitError: anthropic ... Too Many Requests

Provider rate-limit hit. The router will fall back to OpenAI / Google automatically if configured. To verify failover:

mts mts1b-llm providers status

LLM responses are inconsistent run-to-run

Set temperature=0.0 (deterministic) in the persona YAML:

name: my_persona
temperature: 0.0

Note: even at temperature 0, providers sometimes have small non-determinism (cache, model updates).

Frontends

webui shows "Cannot connect to OMS"

Check OMS health:

mts mts1b-deploy status mts1b-oms
curl -i http://localhost:8001/healthz

Likely fix:

sudo systemctl restart mts1b-oms

TUI shows garbled characters

Your terminal doesn't support truecolor. Set:

export TERM=xterm-256color
mts tui

Or set MTS1B_TUI_MONOCHROME=1 for a monochrome fallback.

Audit chain

audit verify returns "chain integrity FAILED at seq 42"

Tampering or storage corruption detected. Stop trading. Investigate.

# Inspect the entry
mts mts1b-operations audit show --sequence 42

# Diff against backup
restic restore <snapshot_id> --target /tmp/audit-backup
diff /apps/MTS1B/data/audit/main.log /tmp/audit-backup/main.log

If the diff shows malicious changes, restore from backup + rotate Vault secrets + investigate root cause.

Vault

permission denied on vault kv get

Your token doesn't have read access to that path. Check policy:

vault token lookup
vault policy list
vault policy read <policy_name>

Common fix: rotate via AppRole login:

vault write auth/approle/login \
role_id=<your_role_id> \
secret_id=<your_secret_id>

Then use the returned token.

Vault sealed

vault status
# Sealed: true

Unseal with 3 of 5 shares:

vault operator unseal <share_1>
vault operator unseal <share_2>
vault operator unseal <share_3>

If you don't have the shares, you've lost the Vault data. Restore from backup.

Performance

Backtest is slow

SymptomLikely causeFix
5+ minutes per runCPU backend with large universeSwitch to GPU: --backend cuda
Memory errorUniverse × duration too bigChunk: --chunk-size-mb 512
Lots of disk I/OCold parquet cacheFirst run warm; subsequent runs faster

OMS slow on submit

mts mts1b-platform metric latency --service mts1b-oms --window 5m

Look for p99 > 50ms. Likely culprits:

  • Riskengine gRPC slow (mts mts1b-platform metric latency --service mts1b-riskengine)
  • DB pool exhausted (mts mts1b-platform metric db_pool --pool primary)
  • NATS publish slow (mts mts1b-platform metric nats --subject "mts.v1.oms.>")

Where else to look