GitHub - faresmohamed260/saga: S.A.G.A. — Story Analysis, Generation, and Archives. Canon-aware narrative intelligence for analysis, retrieval, timelines, and story generation.

Production-oriented narrative intelligence system for extracting structured story knowledge from EPUB and PDF books, preserving canon, and preparing grounded inputs for future pre-canon, mid-canon, post-canon, and fanfiction authoring workflows.

The project ingests one or more books, splits them into scenes, analyzes each scene with LLM-backed extractors, and builds reusable narrative outputs such as:

chapter rows
scene analyses
entity registry
state transitions
canon snapshots
timeline events
character timelines
alias and identity decisions
causal graph and metrics
searchable story index

The main product surface is the Streamlit dashboard in story_dashboard.py.

Main Features

Unified series ingestion through EPUB and PDF processors
Continuous target-word scene sizing with 0 = one full chapter per scene
Cross-chapter chunk merging for nonzero target sizes
Parallel scene analysis and identity analysis
Incremental alias-map updates during processing
Deterministic downstream rebuilding after each scene
JSON contract export from the dashboard
Search across scenes, timeline, state, identities, and causal graph outputs

Project Structure

story_dashboard.py Main Streamlit application.
services Book ingestion and chapter extraction.
analysis Scene splitting and per-scene LLM analysis.
entities Entity registry building.
state State transitions and canon snapshots.
timeline Timeline, character timeline, normalization, and causal graph services.
rag Searchable indexing services.
query Story search/query services.
docs Project documentation.

Running The Dashboard

From the project root:

streamlit run story_dashboard.py

Installation

Create a virtual environment, activate it, and install the project in editable mode:

python -m venv venv
venv\Scripts\activate
pip install -e .[dev]

Dashboard Workflow

Upload one or more EPUB or PDF books.
Choose:
- scene analysis model
- identity model
- target scene size in words
  - 0 means one full chapter per scene
  - values above 0 can merge across chapter boundaries when needed
Click Run Pipeline.
Review outputs in the dashboard tabs.
Export the pipeline result using Export JSON Contract from the sidebar after the run completes.

JSON Export

The dashboard can export a full JSON contract containing:

run metadata
inputs
chapters
scene analyses
resolved scene analyses
entity registry
state result
canon snapshot
timeline
character timelines
identity result
causal graph result
story index summary

See docs/JSON_CONTRACT.md for the contract description.

Testing

The maintained regression coverage lives in tests.

Run the full suite:

pytest tests

Run a single test module:

pytest tests/test_scene_analyzer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main Features

Project Structure

Running The Dashboard

Installation

Dashboard Workflow

JSON Export

Testing

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
analysis		analysis
docs		docs
entities		entities
infrastructure		infrastructure
prompts		prompts
query		query
rag		rag
services		services
state		state
tests		tests
timeline		timeline
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
story_dashboard.py		story_dashboard.py

Folders and files

Latest commit

History

Repository files navigation

Main Features

Project Structure

Running The Dashboard

Installation

Dashboard Workflow

JSON Export

Testing

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages