Skip to main content

mts1b-altdata

Alternative data adapters: SEC EDGAR, GDELT, news, congress trades, insiders, sentiment, filings.

Repo: github.com/MTS1B/mts1b-altdata Layer: 2 Wave: 2 (months 4-7) Depends on: foundation, platform, httpx, feedparser, beautifulsoup4 Audience: mts1b-datalake, mts1b-research

What it is

Unified adapters for non-price data: company filings, news, sentiment, congress trades, insider transactions, government data. Each adapter normalizes into mts1b-foundation types.

Supported sources

SourceCoverageUpdateAuth
sec_edgar10-K, 10-Q, 8-K, S-1, 13F, Form 4real-time RSSnone
gdeltglobal event/news graph15-minnone
congress_tradesHouse + Senate stock disclosuresdailynone (scrapes house.gov, senate.gov)
insidersForm 4 insider transactionsreal-timeEDGAR feed
news_rss200+ financial news RSS feedsreal-timenone
truthsocial_via_gdeltTruth Social via GDELT mirrorhourlynone
government_apisCensus, BLS, BEA, Treasuryvariesnone

⚠️ No scraping behind login walls (Twitter/X, Bloomberg paywalled). Stick to public APIs + RSS + GDELT mirror.

Module layout

mts1b_altdata/
├── __init__.py
├── sec_edgar/
│ ├── client.py
│ ├── filings_parser.py # XBRL → structured rows
│ └── form4.py # insider transactions
├── gdelt/
│ ├── client.py # GKG + Events
│ └── tone_calculator.py
├── congress/
│ ├── house.py # house.gov disclosures
│ └── senate.py # senate.gov disclosures
├── news/
│ ├── rss_aggregator.py
│ ├── sentiment.py # FinBERT-based
│ └── deduper.py # cross-feed dedup
├── government/
│ ├── census.py
│ ├── bls.py
│ └── treasury.py
└── messaging/
└── search.py # cross-source semantic search

API

SecEdgar

from mts1b_altdata.sec_edgar import SecEdgar
from mts1b_foundation.symbology import Symbol

async with SecEdgar(user_agent="mts1b/0.1 [email protected]") as edgar:
# Latest 10-Ks for a symbol
filings = await edgar.filings(Symbol("AAPL"), form="10-K", limit=5)
# [Filing(cik=320193, form="10-K", filed=..., accession="...", text_url=...)]

# Parse XBRL into structured rows
fundamentals = await edgar.fundamentals(Symbol("AAPL"), period="quarterly")
# [FundamentalsRow(asof=..., revenue=..., gross_profit=...)]

# Insider transactions
form4 = await edgar.form4(Symbol("AAPL"), start=date(2026, 1, 1))
# [InsiderTransaction(...)]

User-agent is required by EDGAR's terms; include real contact info.

Gdelt

from mts1b_altdata.gdelt import Gdelt

async with Gdelt() as g:
# Last 24h events mentioning a company
events = await g.events(themes=["BUSINESS"], persons_or_orgs=["Apple Inc"])
# [GdeltEvent(event_id=..., tone=-2.3, num_mentions=42, sources=[...])]

# Tone for a topic
tone = await g.tone_over_time(topic="federal reserve", days=30)
# pd.Series indexed by date

CongressTrades

from mts1b_altdata.congress import CongressTrades

async with CongressTrades() as ct:
trades = await ct.recent_trades(chamber="house", days=30)
# [Trade(member="Speaker Smith", symbol="NVDA", side="buy",
# amount_range=("$1k", "$15k"), filing_date=..., transaction_date=...)]

NewsAggregator

from mts1b_altdata.news import NewsAggregator

async with NewsAggregator(feeds=["reuters", "bloomberg", "wsj", "ft", "cnbc"]) as news:
articles = await news.search("NVIDIA earnings", days=7)
# [Article(title=..., url=..., published_at=..., sentiment=..., summary=...)]

async for article in news.stream():
# Real-time RSS aggregation
print(article.title, article.sentiment)

Sentiment computed via FinBERT (HuggingFace model, runs locally). Per-article costs ~5ms inference on CPU; usable in real time.

Symbol resolution

EDGAR uses CIK numbers; news feeds use ticker text. mts1b-altdata resolves both via:

from mts1b_altdata.resolve import to_cik, to_ticker

await to_cik(Symbol("AAPL")) # 320193
await to_ticker(cik=320193) # "AAPL"

Resolution cache lives in mts1b-datalake (1-week TTL).

Rate limits

SourceLimitAdapter behavior
SEC EDGAR10 req/secper-IP enforcement via platform/ratelimit
GDELTnone statedself-imposed 5 req/sec
News RSSper-publisherconditional GET (If-Modified-Since)
Congressscraping1 req/sec, polite User-Agent

Build + test

pip install -e ".[dev]"
pytest -m unit # hermetic
pytest -m live # requires network access

Roadmap

VersionItems
0.1 (Wave 2)SEC EDGAR, GDELT, congress, insiders, news RSS, FinBERT sentiment
0.2 (Wave 2)Web Archive crawler for press releases
0.3 (Wave 3)International filings: HK, UK, EU equivalents
1.0 (LTS)Stable adapter interface

See also