mts1b-altdata
Alternative data adapters: SEC EDGAR, GDELT, news, congress trades, insiders, sentiment, filings.
Repo: github.com/MTS1B/mts1b-altdata
Layer: 2
Wave: 2 (months 4-7)
Depends on: foundation, platform, httpx, feedparser, beautifulsoup4
Audience: mts1b-datalake, mts1b-research
What it is
Unified adapters for non-price data: company filings, news, sentiment, congress trades, insider transactions, government data. Each adapter normalizes into mts1b-foundation types.
Supported sources
| Source | Coverage | Update | Auth |
|---|---|---|---|
sec_edgar | 10-K, 10-Q, 8-K, S-1, 13F, Form 4 | real-time RSS | none |
gdelt | global event/news graph | 15-min | none |
congress_trades | House + Senate stock disclosures | daily | none (scrapes house.gov, senate.gov) |
insiders | Form 4 insider transactions | real-time | EDGAR feed |
news_rss | 200+ financial news RSS feeds | real-time | none |
truthsocial_via_gdelt | Truth Social via GDELT mirror | hourly | none |
government_apis | Census, BLS, BEA, Treasury | varies | none |
⚠️ No scraping behind login walls (Twitter/X, Bloomberg paywalled). Stick to public APIs + RSS + GDELT mirror.
Module layout
mts1b_altdata/
├── __init__.py
├── sec_edgar/
│ ├── client.py
│ ├── filings_parser.py # XBRL → structured rows
│ └── form4.py # insider transactions
├── gdelt/
│ ├── client.py # GKG + Events
│ └── tone_calculator.py
├── congress/
│ ├── house.py # house.gov disclosures
│ └── senate.py # senate.gov disclosures
├── news/
│ ├── rss_aggregator.py
│ ├── sentiment.py # FinBERT-based
│ └── deduper.py # cross-feed dedup
├── government/
│ ├── census.py
│ ├── bls.py
│ └── treasury.py
└── messaging/
└── search.py # cross-source semantic search
API
SecEdgar
from mts1b_altdata.sec_edgar import SecEdgar
from mts1b_foundation.symbology import Symbol
# Latest 10-Ks for a symbol
filings = await edgar.filings(Symbol("AAPL"), form="10-K", limit=5)
# [Filing(cik=320193, form="10-K", filed=..., accession="...", text_url=...)]
# Parse XBRL into structured rows
fundamentals = await edgar.fundamentals(Symbol("AAPL"), period="quarterly")
# [FundamentalsRow(asof=..., revenue=..., gross_profit=...)]
# Insider transactions
form4 = await edgar.form4(Symbol("AAPL"), start=date(2026, 1, 1))
# [InsiderTransaction(...)]
User-agent is required by EDGAR's terms; include real contact info.
Gdelt
from mts1b_altdata.gdelt import Gdelt
async with Gdelt() as g:
# Last 24h events mentioning a company
events = await g.events(themes=["BUSINESS"], persons_or_orgs=["Apple Inc"])
# [GdeltEvent(event_id=..., tone=-2.3, num_mentions=42, sources=[...])]
# Tone for a topic
tone = await g.tone_over_time(topic="federal reserve", days=30)
# pd.Series indexed by date
CongressTrades
from mts1b_altdata.congress import CongressTrades
async with CongressTrades() as ct:
trades = await ct.recent_trades(chamber="house", days=30)
# [Trade(member="Speaker Smith", symbol="NVDA", side="buy",
# amount_range=("$1k", "$15k"), filing_date=..., transaction_date=...)]
NewsAggregator
from mts1b_altdata.news import NewsAggregator
async with NewsAggregator(feeds=["reuters", "bloomberg", "wsj", "ft", "cnbc"]) as news:
articles = await news.search("NVIDIA earnings", days=7)
# [Article(title=..., url=..., published_at=..., sentiment=..., summary=...)]
async for article in news.stream():
# Real-time RSS aggregation
print(article.title, article.sentiment)
Sentiment computed via FinBERT (HuggingFace model, runs locally). Per-article costs ~5ms inference on CPU; usable in real time.
Symbol resolution
EDGAR uses CIK numbers; news feeds use ticker text. mts1b-altdata resolves both via:
from mts1b_altdata.resolve import to_cik, to_ticker
await to_cik(Symbol("AAPL")) # 320193
await to_ticker(cik=320193) # "AAPL"
Resolution cache lives in mts1b-datalake (1-week TTL).
Rate limits
| Source | Limit | Adapter behavior |
|---|---|---|
| SEC EDGAR | 10 req/sec | per-IP enforcement via platform/ratelimit |
| GDELT | none stated | self-imposed 5 req/sec |
| News RSS | per-publisher | conditional GET (If-Modified-Since) |
| Congress | scraping | 1 req/sec, polite User-Agent |
Build + test
pip install -e ".[dev]"
pytest -m unit # hermetic
pytest -m live # requires network access
Roadmap
| Version | Items |
|---|---|
| 0.1 (Wave 2) | SEC EDGAR, GDELT, congress, insiders, news RSS, FinBERT sentiment |
| 0.2 (Wave 2) | Web Archive crawler for press releases |
| 0.3 (Wave 3) | International filings: HK, UK, EU equivalents |
| 1.0 (LTS) | Stable adapter interface |
See also
mts1b-datalake— primary consumer (ingests altdata into parquet)mts1b-research— sentiment + filings as factor inputsmts1b-marketdata— for price data (this repo is non-price only)