Skip to content

Latest commit

 

History

History
608 lines (441 loc) · 33 KB

File metadata and controls

608 lines (441 loc) · 33 KB

DartLab

DartLab

1 stock code → full company story

Korean DART + US SEC EDGAR filings, structured. 2,700+ KR / 970+ US companies, one line of Python.

PyPI Python License CI Coverage Docs Blog

Docs · Blog · Live Demo · Open in Colab · Open in Molab · 한국어 · Sponsor

HuggingFace Data Desktop Download

DartLab Demo

Every Company Has a Story

Line up numbers and you get a dashboard. Connect their causes and you get a story. DartLab gives you two ways to read that story.

Read it yourself — pull financials, filings, and ratios with a single stock code, then trace "why is this company's margin at this level" through a six-act causal structure. One line of code, and the data tells a story.

Let AI read it for you — the same engines, orchestrated by AI to design an analysis flow tailored to your question, showing every line of code and every result. You don't just get an answer — you learn the method.

Both paths run on the same engines.

The Problem

Have you ever tried to compare Samsung's "Revenue" across five years?

Open a DART annual report and the same number appears as ifrs-full_Revenue, dart_Revenue, 매출액, 영업수익 — four different names. Last year's table of contents doesn't match this year's. Comparing with SK Hynix means starting from scratch.

The real problem isn't missing data. It's the same data existing under too many names.

DartLab is built on one premise: every period must be comparable, and every company must be comparable. It normalizes disclosure sections into a topic-period grid (~95% mapping rate) and standardizes XBRL accounts into canonical names (~97% mapping rate) — so you compare companies, not filing formats.

Quick Start

uv add dartlab
import dartlab

c = dartlab.Company("005930")       # Samsung Electronics

c.sections                          # every topic, every period, side by side
# shape: (41, 12) — 41 topics across 12 periods
#                     2025Q4  2024Q4  2024Q3  2023Q4  ...
# companyOverview       v       v       v       v
# businessOverview      v       v       v       v
# riskManagement        v       v       v       v

Text and numbers on a single timeline — the core of cross-period comparability

c.sections output — Samsung Electronics, 41 topics × 12 periods
c.show("IS")                        # income statement — quarterly by default

Quarterly financials are the default — snakeId + Korean labels side by side

c.show('IS') — Samsung Electronics quarterly income statement
c.show("IS", freq="Y")             # freq="Y" for annual aggregation

Same data, annualized — automatic 4-quarter summation

c.show('IS', freq='Y') — Samsung Electronics annual income statement
c.show("businessOverview")          # what this company actually does
c.diff("businessOverview")          # what changed since last year
c.show("ratios")                    # financial ratios, already calculated

c.filings()                         # all reports — direct links to DART viewer

From annual reports to quarterly filings, dartUrl links straight to the original

c.filings() — Samsung Electronics report list with DART viewer links
# Same interface, different country
us = dartlab.Company("AAPL")
us.show("business")
us.show("ratios")

# Ask in natural language
dartlab.ask("Analyze Samsung Electronics financial health")
# → AI executes code and analyzes: "Operating margin rebounded from 8.6% to 21.4%..."

No API key needed. Data auto-downloads from HuggingFace on first use, then loads instantly from local cache.

Three Layers of Analysis

Company prepares data with one stock code. Three layers analyze it.

  1. Analysis engines — produce numbers. Margin trends, cash flow patterns, default probability, peer comparison, macro cycles. No interpretation — numbers and evidence only.
  2. review — assembles engine data into reports by combining blocks. 11 report types × 7 company templates. No interpretation — systematically arranges evidence from diverse perspectives.
  3. AI — calls engines directly and makes judgments. Questions results, verifies against raw data, recalculates with adjusted assumptions when something looks wrong. dartlab's active analyst.

What DartLab Is

One calling convention. Each engine: dartlab.engine() for the guide, dartlab.engine("axis") to run.

New here? Start with CompanyReviewAsk. Load data, generate a report, then ask AI.

Layer Engine What it does Entry point Notebook
Data Data Pre-built HuggingFace datasets, auto-download Company("005930")
L0/L1 Company Filings + financials + structured data unified by ticker c.show(), c.select() Colab marimo
L1 Gather External market data (price, flow, macro, news) dartlab.gather() Colab marimo
L1 Scan Cross-company comparison (governance, ratios, cashflow, ...) dartlab.scan() Colab marimo
L1 Quant Technical & quantitative analysis (momentum/factor/pattern) c.quant() Colab marimo
L2 Analysis Profitability/stability/cashflow causal analysis + valuation + forecast c.analysis("financial", "수익성") Colab marimo
L2 Macro Market-level macro (cycle/rates/liquidity/sentiment/assets) dartlab.macro("사이클") Colab marimo
L2 Credit Independent credit rating (dCR grade, default probability, health) c.credit("등급") Colab marimo
L2 Industry Industry mapper — 2,664 listed companies × 34 industries × stage/role/stream + supply-chain edges (atlas at /map) c.industry(), dartlab.industry("semiconductor")
L2 Review Report builder — 6-engine block composition (analysis/quant/credit/macro/scan/industry), 11 types × 7 templates (no interpretation) c.review("수익성") Colab marimo
L3 AI Active analyst — calls engines directly, judges, verifies against raw data dartlab.ask() Colab marimo
L4 Channel External sharing — dartlab channel brings PC dartlab to your phone dartlab channel
core Search Semantic filing search (alpha) dartlab.search() Colab marimo
facade Listing Catalog API (companies, filings, topics) dartlab.listing() Colab marimo
viz Viz Charts and diagrams (emit_chart) emit_chart({...})
guide Guide Concierge — readiness, error handling, education dartlab.guide.checkReady()

All notebooks: marimo · colab · Open in marimo

Company

Design: src/dartlab/README.md

Three data sources — docs (full-text disclosures), finance (XBRL statements), report (DART API) — merged into one object. Data auto-downloads from HuggingFace, no setup needed.

c = dartlab.Company("005930")

c.index                         # what's available -- topic list + periods
c.show("BS")                    # view data -- DataFrame per topic
c.select("IS", ["매출액"])       # extract data -- finance or docs, same pattern
c.trace("BS")                   # where it came from -- source provenance
c.diff()                        # what changed -- text changes across periods

Notes — line items behind BS/IS totals. Access via c.show("topic"), same pattern as finance topics. Works for both DART (K-IFRS HTML parsing) and EDGAR (US-GAAP XBRL tags).

c.show(...) What it shows DART EDGAR
"inventory" Raw materials / work-in-progress / finished goods
"borrowings" Short-term / long-term debt breakdown
"tangibleAsset" PPE gross / net / depreciation
"intangibleAsset" Goodwill / development costs
"receivables" Trade receivables + allowance
"provisions" Warranty / litigation / restructuring
"eps" Basic / diluted EPS
"segments" Revenue / profit by segment
"costByNature" Raw materials / wages / depreciation
"lease" Right-of-use assets / lease liabilities
"affiliates" Equity method investments
"investmentProperty" Fair value / carrying amount

marimo Colab

Scan — Cross-Company Comparison

Design: src/dartlab/scan/README.md

Cross-company analysis across all listed firms. Governance, workforce, capital, debt, cashflow, audit, insider, quality, liquidity, network, account/ratio comparison, and more.

dartlab.scan("governance")            # governance across all firms
dartlab.scan("ratio", "roe")          # ROE across all firms
dartlab.scan("account", "매출액")      # revenue time-series across all firms

2,500+ companies at a glance — quarterly revenue side by side

dartlab.scan('account', '매출액') — cross-company revenue comparison

Gather — External Market Data

Design: src/dartlab/gather/README.md

Price, flow, macro, news — all as Polars DataFrames.

dartlab.gather("price", "005930")             # KR OHLCV
dartlab.gather("price", "AAPL", market="US")  # US stock
dartlab.gather("macro", "FEDFUNDS")           # auto-detects US
dartlab.gather("news", "삼성전자")             # Google News RSS

Analysis — 14-Axis Financial Analysis

Design: src/dartlab/analysis/README.md

Revenue structure → profitability → growth → stability → cash flow → capital allocation → valuation → forecast. Turns raw statements into a causal narrative that feeds Review, AI, and direct human reading.

c.analysis("financial", "수익성")       # profitability analysis
c.analysis("financial", "현금흐름")    # cash flow analysis

print(c.credit())                           # available-axes guide DataFrame (self-discovery)
c.credit("등급")                            # dCR-AA, healthScore 93/100
c.credit("등급", detail=True)               # grade + narrative + metrics

Credit — Independent Credit Rating

Design: src/dartlab/analysis/CREDIT.md | Reports: dartlab.pages.dev/blog/credit-reports

Independent credit analysis with 3-Track model (general/financial/holding), Notch Adjustment, CHS market correction, and separate financial statement blending.

79-company validation: large-cap 87% (26/30), mid-cap 82% (41/50), full sample 70% (55/79, re-measurement pending after v5.0 overvaluation fix). Samsung AA+ exact match. See methodology for validation details.

print(c.credit())           # self-discovery — available axes + grade

cr = c.credit("등급")        # main grade
print(cr["grade"])          # dCR-AA+
print(cr["healthScore"])    # 96 (0-100, higher is better)
print(cr["pdEstimate"])     # 0.01% default probability

cr = c.credit("등급", detail=True)  # grade + narrative + metrics + divergence explanation
print(cr["divergenceExplanation"])  # why it differs from agencies

Publish reports (credit narrative + audit are auto-included in review's 5막):

from dartlab.review.publisher import publishReport
publishReport("005930")               # 6막 report including credit narrative + audit

Macro — Economy Without a Ticker

Design: src/dartlab/macro/README.md

Analyze the economic environment without a Company. Just import dartlab.

dartlab.macro("사이클")          # business cycle — 4 phases
dartlab.macro("금리")            # rates + Nelson-Siegel yield curve
dartlab.macro("예측")            # LEI + recession prob + Hamilton RS + GDP Nowcast
dartlab.macro("종합")            # macro synthesis + strategy + portfolio mapping

Market cycle, rates, liquidity, sentiment, and asset signals with global macro methodologies (Hamilton EM, Kalman DFM, Nelson-Siegel, Cleveland Fed probit, Sahm Rule, BIS Credit-to-GDP) — pure numpy, zero statsmodels/scipy.

Backtest (2000-2024, FRED): Cleveland Fed probit detected all 3/3 US recessions 2-16 months ahead, recall 90%.

Review — Analysis to Report

Design: src/dartlab/review/README.md

Assembles analysis into a structured report. 4 output formats: rich (terminal), html, markdown, json.

c.review()              # full report
c.reviewer()            # report + AI interpretation

Samsung report preview: "Revenue +23.8%, operating margin 8.6%→21.4%. FCF turned positive, ROIC > WACC — reinvestment is creating value."

Sample reports: Samsung Electronics · SK Hynix · Kia · HD Hyundai Heavy Industries · SK Telecom · LG Chem · NCSoft · Amorepacific

Storyteller — Numbers Tell Stories

Design: src/dartlab/review/README.md · Series: Company Stories

Financial analysis isn't ratio tables. DartLab combines 5 engines (analysis, credit, scan, quant, macro) into a 6-act storytelling structure that auto-generates publishable company stories.

from dartlab.review.publisher import publishReport
publishReport("068270")    # Celltrion — auto-publish 6-act company story

Published stories:

Company Story
SK Hynix 30-year Korean semiconductor mystery, 58% operating margin
Samyang Foods From last place in Korea's ramen Big 3 to a ₩2.3T global food giant
Doosan Enerbility Debt ratio from 305% to 129% — the real story of a 9-year diet
Alteogen 9 years of losses, then one license deal turned ₩106.9B operating profit
HMM The company where cycles, not markets, decide the stock price
Celltrion Laid off at 41 during IMF crisis, started with $50K — 25 years later, ₩13.78T in intangibles
Hanwha Aerospace Samsung dumped it for ₩840B — now it has ₩37T in order backlog
HD Hyundai Electric ₩100.6B loss 7 years ago became ₩1T this year — with one product: transformers
Korea Zinc First net loss in 50 years at ₩245.7B, yet operating profit hit all-time high
APR A cosmetics company sold ₩407B in home appliances — that was just the start

Search — Find Filings by Meaning (alpha)

Design: src/dartlab/core/search/README.md

No model, no GPU, no cold start. 95% precision on 4M documents — better than neural embeddings at 1/100th the cost. See methodology for benchmark details.

dartlab.search("유상증자 결정")                     # find capital raise filings
dartlab.search("대표이사 변경", corp="005930")       # filter by company
dartlab.search("회사가 돈을 빌렸다")                 # natural language works too

AI — Active Analyst

Design: src/dartlab/ai/README.md

The AI writes and executes Python code using dartlab's full API. You see every line of code it runs. 60+ questions validated, 95%+ first-try success. See methodology for validation scope and limits.

dartlab.ask("Analyze Samsung Electronics financial health")
dartlab.ask("Samsung analysis", provider="gemini")  # free providers available

Providers: gemini (free), groq (free), cerebras (free), oauth-codex (ChatGPT subscription), openai, ollama (local), and more. Auto-fallback across providers when rate-limited.

Channel — Use your PC dartlab from anywhere

Design: ops/channel.md

One command on your PC and dartlab UI works on your phone. Microsoft DevTunnels auto-setup.

dartlab channel

Flow:

  1. winget auto-installs the devtunnel CLI (one-time)
  2. GitHub OAuth (one-time, browser opens automatically)
  3. Permanent URL + QR code (https://<id>-8400.<region>.devtunnels.ms)
  4. Open the URL/QR on your phone Chrome → dartlab UI just works

Zero domains, zero token tricks. Same infrastructure as VS Code Remote Tunnels — verified mobile compatibility. Optional messaging bots: --telegram/slack/discord.

Architecture

L0  core/        Protocols, finance utils, docs utils, registry
L1  providers/   Country-specific data (DART, EDGAR, EDINET)
    gather/      External market data (Naver, Yahoo, FRED)
    scan/        Market-wide analysis — scan("group", "axis")
    quant/       Technical analysis — c.quant()
L2  analysis/    Financial + forecast + valuation — analysis("group", "axis")
    credit/      Independent credit rating — c.credit()
    macro/       Market-level macro — dartlab.macro()
    review/      5-engine composition (analysis + credit + scan + quant + macro)
L3  ai/          Active analyst — dartlab.ask()
L4  vscode/      VSCode extension (dartlab chat --stdio)
    ui/web/      Svelte SPA web interface

Import direction enforced by CI. Adding a new country means one provider package — zero core changes.

Layer consumption flow

Who consumes whom across the stack:

flowchart TB
    subgraph L4["L4 · User interface"]
        UI["vscode / CLI / web"]
    end
    subgraph L3["L3 · LLM analyst"]
        AI["ai<br/>dartlab.ask()"]
    end
    subgraph L2["L2 · Analysis"]
        ANA["analysis<br/>causal financial + forecast + valuation"]
        CRD["credit<br/>independent rating"]
        MAC["macro<br/>market reading"]
        REV["review<br/>block-composed report"]
    end
    subgraph L1["L1 · Data ingestion"]
        PRV["providers<br/>DART / EDGAR / EDINET"]
        GAT["gather<br/>FRED / ECOS / Naver / Yahoo"]
        SCN["scan<br/>cross-market"]
        QNT["quant<br/>25 technical indicators"]
    end
    subgraph L0["L0 · Infrastructure"]
        CORE["core<br/>protocols + finance + docs + search"]
    end

    UI --> AI
    AI --> REV
    AI --> ANA
    AI --> MAC
    AI --> SCN
    REV --> ANA
    REV --> CRD
    REV --> SCN
    REV --> QNT
    REV --> MAC
    ANA --> PRV
    ANA --> GAT
    CRD --> PRV
    MAC --> GAT
    SCN --> PRV
    QNT --> GAT
    PRV --> CORE
    GAT --> CORE
    SCN --> CORE
    QNT --> CORE

    classDef l0 fill:#f5f5f5,stroke:#999
    classDef l1 fill:#e8f4ff,stroke:#4a90e2
    classDef l2 fill:#fff4e6,stroke:#e67e22
    classDef l3 fill:#f0e6ff,stroke:#8e44ad
    classDef l4 fill:#e6ffe6,stroke:#27ae60
    class CORE l0
    class PRV,GAT,SCN,QNT l1
    class ANA,CRD,MAC,REV l2
    class AI l3
    class UI l4
Loading

Core rules:

  • Arrows always flow top → bottom (L4→L3→L2→L1→L0). Reverse imports forbidden (CI-enforced)
  • L2 engines never import each other — analysis ↛ credit, macro ↛ analysis. Composition is review's or ai's job
  • When adding a feature, pick the right layer first and let data flow in one direction only

EDGAR (US)

Same interface, different data source. Auto-fetched from SEC API — no pre-download needed.

# Korea (DART)                          # US (EDGAR)
c = dartlab.Company("005930")           c = dartlab.Company("AAPL")
c.sections                              c.sections
c.show("businessOverview")              c.show("business")
c.show("BS")                            c.show("BS")
c.show("ratios")                        c.show("ratios")
c.diff("businessOverview")              c.diff("10-K::item7Mdna")

MCP — AI Assistant Integration

Built-in MCP server with 25 tools covering all dartlab engines.

No Install Required (Remote MCP)

No need to install dartlab. Add to Claude Desktop claude_desktop_config.json:

{
  "mcpServers": {
    "dartlab": {
      "url": "https://eddmpython-dartlab.hf.space/mcp/sse"
    }
  }
}

Hosted on HuggingFace Spaces. No DART API key needed. → Details

Local Install (stdio MCP)

# Claude Code — one line setup
claude mcp add dartlab -- uv run dartlab mcp

# Codex CLI
codex mcp add dartlab -- uv run dartlab mcp
Claude Desktop / Cursor config

Add to claude_desktop_config.json or .cursor/mcp.json:

{
  "mcpServers": {
    "dartlab": {
      "command": "uv",
      "args": ["run", "dartlab", "mcp"]
    }
  }
}

Or auto-generate: dartlab mcp --config claude-desktop

25 Tools

Category Tools
Analysis companyInsights, companyAnalysis, companyReview, companyValuation, companyForecast, companyCredit
Data companyFinancials, companyRatios, companyShow, companyTopics, companyDiff, companyFilings
Company companyGovernance, companyAudit, companyProfile, companySections, companyGather, companyQuant
Market macroAnalysis, marketScan, gatherData, quantAnalysis, topdownScreen
Search searchCompany, dartlabSearch, dartlabListing

REST API — No API Key Required

DART API proxy on HuggingFace Spaces. Access real-time disclosure data without an API key:

# Filing list
curl "https://eddmpython-dartlab.hf.space/api/dart/filings?corp=005930&start=20260101"

# Company info
curl "https://eddmpython-dartlab.hf.space/api/dart/company/005930"

# Financial statements
curl "https://eddmpython-dartlab.hf.space/api/dart/finance/005930?year=2024"

# Reports (dividend, employee, executive — 56 categories)
curl "https://eddmpython-dartlab.hf.space/api/dart/report/005930/배당?year=2023"

OpenAPI — Raw Public APIs

from dartlab import OpenDart, OpenEdgar

# Korea (requires free API key from opendart.fss.or.kr)
d = OpenDart()
d.filings("삼성전자", "2024")
d.finstate("삼성전자", 2024)

# US (no API key needed)
e = OpenEdgar()
e.filings("AAPL", forms=["10-K", "10-Q"])

Data

All data is pre-built on HuggingFace — auto-downloads on first use. EDGAR data comes directly from the SEC API.

Dataset Coverage Size
DART docs 2,500+ companies ~8 GB
DART finance 2,700+ companies ~600 MB
DART report 2,700+ companies ~320 MB
EDGAR On-demand SEC API

Pipeline: local cache (instant) → HuggingFace (auto-download) → DART API (with your key). Most users never leave the first two.

Try It Now

Live Demo — no install, no Python

Notebooks: Company · Scan · Review · Gather · Analysis · Ask (AI)

Documentation

Docs · Quick Start · API Overview

Blog (120+ articles): All · Company Stories · Credit Reports · Macro Reports

Stability

Tier Scope
Stable DART Company (sections, show, trace, diff, BS/IS/CF, CIS, index, filings, profile), EDGAR Company core, valuation, forecast, simulation
Beta EDGAR power-user (SCE, notes, freq, coverage), credit, insights, distress, ratios, timeseries, network, governance, workforce, capital, debt, chart/table/text tools, ask/chat, OpenDart, OpenEdgar, Server API, MCP
Experimental AI tool calling, export, viz (charts)

See docs/stability.md.

Contributing

Contributors are very welcome. Whether it's a bug report, a new analysis axis, a mapping fix, or a documentation improvement — every contribution makes dartlab better for everyone.

The one rule: experiment first, engine second. Validate your idea in experiments/ before changing the engine. This keeps the core stable while making it easy to try bold ideas.

  • Experiment folder: experiments/XXX_name/ — each file must be independently runnable with actual results in its docstring
  • Data contributions (e.g. accountMappings.json, sectionMappings.json): accepted when backed by experiment evidence
  • Issues and PRs in Korean or English are both welcome
  • Not sure where to start? Open an issue — we'll help you find the right place

License

Apache License 2.0 — free to use, just include the NOTICE attribution.