How to extend ingestion, extraction, or publication without forking the monolith.
Stage execution for SPEC 8.x flows is centralized in pipeline/src/sm_pipeline/pipeline_orchestrator.py.
- Default behavior:
run_pipeline_for_paperruns selectedPipelineStagevalues in order using built-in handlers. - Overrides: Call
register_pipeline_stage_handler(stage, handler)to substitute a stage implementation (for example in tests or a downstream plugin package that vendorssm_pipeline). Usereset_pipeline_stage_handlers()in test teardown to restore defaults.
Handlers must have the signature:
def handler(repo_root: Path, paper_id: str) -> StageOutcome: ...Stages that remain manual by design (formalization, kernel_linkage) still emit skipped outcomes from the orchestrator; they are not registered handlers.
- Single path: Use
sm_pipeline.publish.canonical.publish_paper_artifactswhenever you regenerate one paper’s published JSON soportal/.generated/corpus-export.jsonis refreshed consistently. - Portal bundle shape: Only
build_portal_bundledefines the export structure; the CLI writes it viaexport_portal_data.
- Add new invariant checks by extending the gate engine in
validate/gate_engine.pyrather than ad hoc scripts, sovalidate-alland--report-jsonstay authoritative.
- Provider code:
pipeline/src/sm_pipeline/llm/(LLMProviderprotocol, Prime Intellect HTTP adapter). - CLI:
sm-pipeline llm-claim-proposals,llm-mapping-proposals,llm-lean-proposals,llm-lean-proposals-to-apply-bundle,llm-apply-*(see prime-intellect-llm.md). - Extension pattern: wrap
run_extraction_stageor add a local script that calls proposal generators; do not auto-apply in CI. Preferregister_pipeline_stage_handleronly if the substituted handler remains deterministic or is explicitly opt-in via environment flags. - Sidecar validation:
validate/llm_proposals.pyis warn-only when suggestion sidecars (llm_claim_proposals.json,llm_mapping_proposals.json,llm_lean_proposals.json,suggested_*.json) exist under a paper directory. - Eval / regression: Prompt literals and template digests live in
llm/prompt_templates.py. Reviewed reference bundles underbenchmarks/llm_eval/are scored by benchmark taskllm_eval;just benchmarkalso emits top-levelllm_prompt_templates. See ADR 0013. - Publish escape hatch: set
SM_PUBLISH_REUSE_MANIFEST_GRAPHS=1only if you intentionally need to preserve prior manifestdependency_graph/kernel_index(default is fresh recompute each publish).
Schema changes require updates in lockstep per project rules: JSON schema under schemas/, Pydantic models under pipeline/src/sm_pipeline/models/, fixtures under schemas/examples/, and notes in Schema versioning and migration notes.
blueprint/ and blueprints/ are narrative and structural docs today. Integration with the leanblueprint ecosystem (auto-generated dependency graphs from Lean) is deferred: not required for merge gates.
Until then:
- Authoritative mapping:
corpus/papers/<paper_id>/mapping.jsonand Lean sources underformal/. - Check:
sm-pipeline check-paper-blueprint <paper_id>compares blueprint markdown to mapping when present.
When leanblueprint is adopted, update this section and an ADR as needed.