Skip to content

arriqaaq/ilm

Repository files navigation

Ilm — Islamic Knowledge Platform

Search the Quran & Sunnah. Deeply.
A semantic search platform for Islamic scholarship — Quran with tafsir, 34K+ hadiths with narrator chains, and interactive isnad graphs.


Methodology & Algorithms — Mustalah al-hadith isnad analysis  |  Data Sources — Dataset documentation

See also Barmaver's Dismantling Orientalist Narratives (2025, free on Academia.edu).

Architecture

Architecture overview

Rust backend serving a SvelteKit SPA, with SurrealDB as a unified graph + vector + full-text database. Embeddings via FastEmbed, LLM via local Ollama.

Features

  • Quran Reader — 114 surahs with Tajweed Arabic, Sahih International translation, expandable Tafsir Ibn Kathir per ayah
  • Hadith Explorer — 34K+ hadiths from 926 books across the 6 canonical collections
  • Narrator Networks — 18K+ narrators with interactive Cytoscape.js graph visualization, Ibn Hajar reliability grades
  • Hybrid Search — BM25 full-text + 1024-dim semantic vectors fused with Reciprocal Rank Fusion
  • Ask AI (GraphRAG) — Natural language Q&A grounded in Quran and Hadith via local Ollama, with isnad-aware context and narrator chain citations
  • Early Manuscripts — Per-ayah high-resolution manuscript images from Corpus Coranicum (Berlin-Brandenburg Academy), viewable with zoom
  • Isnad Analysis — Hadith family clustering, mustalah-based chain grading (sahih/hasan/da'eef), transmission breadth (mutawatir/mashhur/aziz/gharib), corroboration detection (mutaba'at/shawahid), word-level matn diffing
  • Personal Study Notes — Annotate any ayah or hadith, collect evidence by topic with @mentions that embed Quran verses and hadiths inline, tag-based organization, color-coded highlights, and full-text search across your notes

Quick Start

Prerequisites

  • Rust (latest stable)
  • Node.js (v20+)
  • Ollamaollama pull command-r7b-arabic && ollama serve

Build & Run

git clone https://github.com/arriqaaq/ilm.git && cd ilm
make download-data            # download pre-built data (no ingestion needed)
make dev                      # build & start server at localhost:3000

Note: SurrealDB's HNSW vector index requires extra stack space. When running cargo run directly (outside of make), set RUST_MIN_STACK=8388608. The Makefile handles this automatically.

Data Sources

Dataset Records Content
SemanticHadith KG V2 34K hadiths Knowledge graph with narrator chains across 6 canonical collections
Sunnah.com 33K translations Human English for 6 canonical collections
QUL (Tarteel) 6,236 ayahs QPC Hafs Arabic + Sahih International English
Tafsir Ibn Kathir 6,236 ayahs Classical exegesis in English (HTML)
AR-Sanad 18K narrators Ibn Hajar reliability classifications (Taqrib al-Tahdhib)

All datasets are auto-downloaded on first run. See DATA_SOURCES.md for details.

Ingest Pipeline

Ingest pipeline

Parses the SemanticHadith KG, builds the narrator graph, generates embeddings, and merges human English translations from sunnah.com. Use --translate to fill gaps with Ollama.

Search

Search flow

Three modes: Hybrid (default — BM25 + vector via Reciprocal Rank Fusion), Text (substring match), and Semantic (pure vector similarity). Works across both Arabic and English text.

Ask (GraphRAG)

GraphRAG flow

Ask questions in natural language. The system classifies the question, retrieves relevant Quran ayahs and hadiths via vector search, traverses the narrator graph to reconstruct each isnad (chain of narration), and passes this as context to a local LLM that streams a grounded answer with citations.

Graph Model

Database graph model

SurrealDB stores narrators, hadiths, books, and ayahs as documents connected by heard_from, narrates, belongs_to, and references_hadith graph edges — enabling isnad reconstruction, Quran-Hadith cross-referencing, and network analysis.

Early Manuscripts

Early Quranic manuscript — Berlin, Wetzstein II 1913
Berlin, Staatsbibliothek: Wetzstein II 1913 — Surah 2:238

Per-ayah manuscript images from Corpus Coranicum (Berlin-Brandenburg Academy of Sciences). Click "Manuscripts" on any ayah to view high-resolution scans of early Quranic manuscripts — fetched live from the Corpus Coranicum API.

Personal Study Notes

Personal Study Notes

Annotate any ayah or hadith with personal notes. Collect evidence by topic using @mentions that embed Quran verses and hadiths inline as rich cards. Organize with tags and color-coded highlights. Notes are stored in a separate user_note table — safely deletable without impacting ingested data.

  • @Mentions — type @2:255 to embed a Quran ayah, @im_1 for a hadith, or search narrators by name
  • Topic Collections — save ayahs and hadiths from anywhere into named study notes via the "Save" button
  • Tags & Search — tag notes for organization, search across all notes by content or tag
  • Color Highlights — 5 color options (yellow, green, blue, pink, purple) for visual categorization
  • Rich Embeds — embedded references show the actual Arabic text and translation inline

Training Pipeline

Training pipeline

Fine-tune a domain-specific LLM on hadith and Quran data, then deploy it through the existing Ollama-based ask loop with zero backend changes. The pipeline generates ~1,400 ChatML Q&A pairs matching the exact RAG prompt pattern from rag.rs, fine-tunes via LoRA (MLX locally or Unsloth on Colab), and exports to GGUF for Ollama. See TRAINING.md for the full guide.

Tech Stack

Layer Technology Purpose
Backend Rust, Axum HTTP server, JSON API
Database SurrealDB (SurrealKV) Graph + HNSW vectors + BM25 full-text
Embeddings FastEmbed (bge-m3) 1024-dim semantic vectors
Frontend SvelteKit 2, Svelte 5 SPA served as static files
Graph Viz Cytoscape.js Narrator network visualization
LLM Ollama (local) Translation fallback + GraphRAG Q&A

Contributing

git clone https://github.com/arriqaaq/ilm.git && cd ilm
make build
cargo run -- ingest --limit 5 --translate   # quick test data
cd frontend && npm run dev                   # hot reload at :5173

See METHODOLOGY.md for the scholarly framework and DATA_SOURCES.md for dataset documentation.

About

A semantic search platform for Islamic scholarship — Quran with tafsir, 34K+ hadiths with narrator chains, and interactive isnad graphs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors