Production-oriented narrative intelligence system for extracting structured story knowledge from EPUB and PDF books, preserving canon, and preparing grounded inputs for future pre-canon, mid-canon, post-canon, and fanfiction authoring workflows.
The project ingests one or more books, splits them into scenes, analyzes each scene with LLM-backed extractors, and builds reusable narrative outputs such as:
- chapter rows
- scene analyses
- entity registry
- state transitions
- canon snapshots
- timeline events
- character timelines
- alias and identity decisions
- causal graph and metrics
- searchable story index
The main product surface is the Streamlit dashboard in story_dashboard.py.
- Unified series ingestion through EPUB and PDF processors
- Continuous target-word scene sizing with
0 = one full chapter per scene - Cross-chapter chunk merging for nonzero target sizes
- Parallel scene analysis and identity analysis
- Incremental alias-map updates during processing
- Deterministic downstream rebuilding after each scene
- JSON contract export from the dashboard
- Search across scenes, timeline, state, identities, and causal graph outputs
- story_dashboard.py Main Streamlit application.
- services Book ingestion and chapter extraction.
- analysis Scene splitting and per-scene LLM analysis.
- entities Entity registry building.
- state State transitions and canon snapshots.
- timeline Timeline, character timeline, normalization, and causal graph services.
- rag Searchable indexing services.
- query Story search/query services.
- docs Project documentation.
From the project root:
streamlit run story_dashboard.pyCreate a virtual environment, activate it, and install the project in editable mode:
python -m venv venv
venv\Scripts\activate
pip install -e .[dev]- Upload one or more EPUB or PDF books.
- Choose:
- scene analysis model
- identity model
- target scene size in words
0means one full chapter per scene- values above
0can merge across chapter boundaries when needed
- Click
Run Pipeline. - Review outputs in the dashboard tabs.
- Export the pipeline result using
Export JSON Contractfrom the sidebar after the run completes.
The dashboard can export a full JSON contract containing:
- run metadata
- inputs
- chapters
- scene analyses
- resolved scene analyses
- entity registry
- state result
- canon snapshot
- timeline
- character timelines
- identity result
- causal graph result
- story index summary
See docs/JSON_CONTRACT.md for the contract description.
The maintained regression coverage lives in tests.
Run the full suite:
pytest testsRun a single test module:
pytest tests/test_scene_analyzer.py