Skip to content

faresmohamed260/saga

Repository files navigation

S.A.G.A. Logo

Production-oriented narrative intelligence system for extracting structured story knowledge from EPUB and PDF books, preserving canon, and preparing grounded inputs for future pre-canon, mid-canon, post-canon, and fanfiction authoring workflows.

The project ingests one or more books, splits them into scenes, analyzes each scene with LLM-backed extractors, and builds reusable narrative outputs such as:

  • chapter rows
  • scene analyses
  • entity registry
  • state transitions
  • canon snapshots
  • timeline events
  • character timelines
  • alias and identity decisions
  • causal graph and metrics
  • searchable story index

The main product surface is the Streamlit dashboard in story_dashboard.py.

Main Features

  • Unified series ingestion through EPUB and PDF processors
  • Continuous target-word scene sizing with 0 = one full chapter per scene
  • Cross-chapter chunk merging for nonzero target sizes
  • Parallel scene analysis and identity analysis
  • Incremental alias-map updates during processing
  • Deterministic downstream rebuilding after each scene
  • JSON contract export from the dashboard
  • Search across scenes, timeline, state, identities, and causal graph outputs

Project Structure

  • story_dashboard.py Main Streamlit application.
  • services Book ingestion and chapter extraction.
  • analysis Scene splitting and per-scene LLM analysis.
  • entities Entity registry building.
  • state State transitions and canon snapshots.
  • timeline Timeline, character timeline, normalization, and causal graph services.
  • rag Searchable indexing services.
  • query Story search/query services.
  • docs Project documentation.

Running The Dashboard

From the project root:

streamlit run story_dashboard.py

Installation

Create a virtual environment, activate it, and install the project in editable mode:

python -m venv venv
venv\Scripts\activate
pip install -e .[dev]

Dashboard Workflow

  1. Upload one or more EPUB or PDF books.
  2. Choose:
    • scene analysis model
    • identity model
    • target scene size in words
      • 0 means one full chapter per scene
      • values above 0 can merge across chapter boundaries when needed
  3. Click Run Pipeline.
  4. Review outputs in the dashboard tabs.
  5. Export the pipeline result using Export JSON Contract from the sidebar after the run completes.

JSON Export

The dashboard can export a full JSON contract containing:

  • run metadata
  • inputs
  • chapters
  • scene analyses
  • resolved scene analyses
  • entity registry
  • state result
  • canon snapshot
  • timeline
  • character timelines
  • identity result
  • causal graph result
  • story index summary

See docs/JSON_CONTRACT.md for the contract description.

Testing

The maintained regression coverage lives in tests.

Run the full suite:

pytest tests

Run a single test module:

pytest tests/test_scene_analyzer.py

Documentation

About

S.A.G.A. — Story Analysis, Generation, and Archives. Canon-aware narrative intelligence for analysis, retrieval, timelines, and story generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages