Skip to content

Commit 4ddef69

Browse files
committed
refactor: rename CLI to openkb and simplify config
1 parent 40ea8da commit 4ddef69

21 files changed

Lines changed: 174 additions & 180 deletions

.env.example

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,3 @@
33
# Anthropic: LLM_API_KEY=sk-ant-...
44
# Gemini: LLM_API_KEY=AIza...
55
LLM_API_KEY=your-key-here
6-
7-
# PageIndex Cloud API key (optional, leave empty for local PageIndex)
8-
# Get your key at https://pageindex.dev
9-
# PAGEINDEX_API_KEY=your-key-here

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ venv/
1212
# Knowledge base test artifacts
1313
raw/
1414
wiki/
15-
.okb/
15+
.openkb/
1616

1717
# Local only
1818
docs/

README.md

Lines changed: 57 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,19 @@
44
<img src="https://docs.pageindex.ai/images/openkb.png" alt="OpenKB (by PageIndex)" />
55
</a>
66

7-
# OpenKB: Open LLM Knowledge Base
7+
# OpenKB Open LLM Knowledge Base
88

99
<p align="center"><i>Scale to long documents&nbsp;&nbsp;Reasoning-based retrieval&nbsp;&nbsp;Native multi-modality&nbsp;&nbsp;No Vector DB</i></p>
1010

1111
</div>
1212

1313
---
1414

15-
# 📑 Introduction to OpenKB
15+
# 📑 What is OpenKB
1616

17-
Andrej Karpathy [described](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) a workflow where LLMs compile raw documents into a structured, interlinked markdown wiki; summaries, concept pages, cross-references, all maintained automatically. Knowledge compounds over time instead of being re-derived on every query.
17+
**OpenKB (Open Knowledge Base)** is an open-source system (in CLI) that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs, powered by [**PageIndex**](https://github.com/VectifyAI/PageIndex) for vectorless long document retrieval.
1818

19-
**OpenKB (Open Knowledge Base)** is an open-source CLI that implements this workflow, powered by [**PageIndex**](https://github.com/VectifyAI/PageIndex) for vectorless long document retrieval.
19+
The idea is based on a [concept](https://x.com/karpathy/status/2039805659525644595) described by Andrej Karpathy: LLMs generate summaries, concept pages, and cross-references, all maintained automatically. Knowledge compounds over time instead of being re-derived on every query.
2020

2121
### Why not just traditional RAG?
2222

@@ -25,7 +25,7 @@ Traditional RAG rediscovers knowledge from scratch on every query. Nothing accum
2525
### Features
2626

2727
- **Any format** — PDF, Word, PowerPoint, Excel, HTML, Markdown, text, CSV, and more via markitdown
28-
- **Scale to long documents** — Long and complex documents are handled via [PageIndex](https://github.com/VectifyAI/PageIndex) tree indexing, enabling better long-context retrieval
28+
- **Scale to long documents** — Long and complex documents are handled via [PageIndex](https://github.com/VectifyAI/PageIndex) tree indexing, enabling accurate, vectorless long-context retrieval
2929
- **Native multi-modality** — Retrieves and understands figures, tables, and images, not just text
3030
- **Auto wiki** — LLM generates summaries, concept pages, and cross-links. You curate sources; the LLM does the rest
3131
- **Query** — Ask questions against your wiki. The LLM navigates your compiled knowledge to answer
@@ -44,36 +44,40 @@ pip install openkb
4444
### Quick start
4545

4646
```bash
47-
# 1. Create a knowledge base
47+
# 1. Create a directory for your knowledge base
4848
mkdir my-kb && cd my-kb
4949

50-
# 2. Initialize
51-
okb init
50+
# 2. Initialize the knowledge base
51+
openkb init
5252

5353
# 3. Add documents
54-
okb add paper.pdf
55-
okb add ~/papers/ # Add a whole directory
56-
okb add article.html
54+
openkb add paper.pdf
55+
openkb add ~/papers/ # Add a whole directory
56+
openkb add article.html
5757

5858
# 4. Ask questions
59-
okb query "What are the main findings?"
59+
openkb query "What are the main findings?"
6060

6161
# 5. Check wiki health
62-
okb lint
62+
openkb lint
6363
```
6464

6565
### Set up your LLM
6666

67-
OpenKB comes with [multi-LLM support](https://docs.litellm.ai/docs/providers) (e.g., OpenAI, Claude, Gemini) via [LiteLLM](https://github.com/BerriAI/litellm) (pinned to a [safe version](https://docs.litellm.ai/blog/security-update-march-2026)).
67+
OpenKB comes with [multi-LLM support](https://docs.litellm.ai/docs/providers) (e.g., OpenAI, Claude, Gemini) via [LiteLLM](https://github.com/BerriAI/litellm) (pinned to a [safe version](https://docs.litellm.ai/blog/security-update-march-2026)).
6868

69-
Create a `.env` file with your LLM API key. Choose your LLM during `okb init` or edit [`.okb/config.yaml`](#configuration).
69+
Set your model during `openkb init`, or in [`.openkb/config.yaml`](#configuration), using `provider/model` LiteLLM format (like `anthropic/claude-sonnet-4-6`). OpenAI models can omit the prefix (like `gpt-5.4`).
70+
71+
Create a `.env` file with your LLM API key:
7072

7173
```bash
7274
LLM_API_KEY=your_llm_api_key
7375
```
7476

7577
# 🧩 How It Works
7678

79+
### Architecture
80+
7781
```
7882
raw/ You drop files here
7983
@@ -82,8 +86,7 @@ raw/ You drop files here
8286
├─ Long PDFs ──→ PageIndex ────→ LLM reads document trees
8387
│ │
8488
│ ▼
85-
│ Wiki Compilation
86-
│ (single LLM session)
89+
│ Wiki Compilation (using LLM)
8790
│ │
8891
▼ ▼
8992
wiki/
@@ -97,16 +100,16 @@ wiki/
97100
└── reports/ Lint reports
98101
```
99102

100-
### Two paths, one wiki
103+
### Short vs. long document handling
101104

102105
| | Short documents | Long documents (PDF ≥ 20 pages) |
103106
|---|---|---|
104107
| **Convert** | markitdown → Markdown | PageIndex → tree index + summaries |
105108
| **Images** | Extracted inline (pymupdf) | Extracted by PageIndex |
106-
| **LLM reads** | Full text | Tree summaries only |
109+
| **LLM reads** | Full text | Document trees |
107110
| **Result** | summary + concepts | summary + concepts |
108111

109-
Short docs are read in full by the LLM. Long PDFs are indexed by PageIndex into a hierarchical tree with summaries. The LLM reads the tree instead of the full text, avoiding context window limits while retaining structural understanding.
112+
Short docs are read in full by the LLM. Long PDFs are indexed by PageIndex into a hierarchical tree with summaries. The LLM reads the tree instead of the full text, enabling better retrieval from long documents.
110113

111114
### The wiki compiles knowledge
112115

@@ -125,34 +128,41 @@ A single source might touch 10-15 wiki pages. Knowledge accumulates: each docume
125128

126129
| Command | Description |
127130
|---|---|
128-
| `okb init` | Initialize a new knowledge base (interactive) |
129-
| `okb add <file_or_dir>` | Add documents and compile to wiki |
130-
| `okb query "question"` | Ask a question against the knowledge base |
131-
| `okb query "question" --save` | Ask and save the answer to `wiki/explorations/` |
132-
| `okb watch` | Watch `raw/` and auto-compile new files |
133-
| `okb lint` | Run structural + knowledge health checks |
134-
<!-- | `okb lint --fix` | Auto-fix what it can | -->
135-
| `okb list` | List indexed documents and concepts |
136-
| `okb status` | Show knowledge base stats |
131+
| `openkb init` | Initialize a new knowledge base (interactive) |
132+
| `openkb add <file_or_dir>` | Add documents and compile to wiki |
133+
| `openkb query "question"` | Ask a question against the knowledge base |
134+
| `openkb query "question" --save` | Ask and save the answer to `wiki/explorations/` |
135+
| `openkb watch` | Watch `raw/` and auto-compile new files |
136+
| `openkb lint` | Run structural + knowledge health checks |
137+
| `openkb list` | List indexed documents and concepts |
138+
| `openkb status` | Show knowledge base stats |
139+
140+
<!-- | `openkb lint --fix` | Auto-fix what it can | -->
137141

138142
### Configuration
139143

140-
Generated by `okb init`, stored in `.okb/config.yaml`:
144+
Settings are initialized by `openkb init`, and stored in `.openkb/config.yaml`:
141145

142146
```yaml
143147
model: gpt-5.4 # LLM model (any LiteLLM-supported provider)
144-
api_key_env: LLM_API_KEY # Environment variable for LLM API key
145148
language: en # Wiki output language
146149
pageindex_threshold: 20 # PDF pages threshold for PageIndex
147-
pageindex_api_key_env: "" # (Optional) Environment variable for PageIndex Cloud API key
148150
```
149151
152+
Model names use `provider/model` LiteLLM [format](https://docs.litellm.ai/docs/providers) (OpenAI models can omit the prefix):
153+
154+
| Provider | Model example |
155+
|---|---|
156+
| OpenAI | `gpt-5.4` |
157+
| Anthropic | `anthropic/claude-sonnet-4-6` |
158+
| Gemini | `gemini/gemini-3.1-pro-preview` |
159+
150160
### PageIndex integration
151161

152-
For long documents, relying solely on summaries often leads to information loss.
153-
We integrate [PageIndex](https://github.com/VectifyAI/PageIndex) into the knowledge base to provide structured, context-aware retrieval for long documents, avoiding the information loss common in summary-based approaches.
162+
Long documents are challenging for LLMs due to context limits, context rot, and summarization loss.
163+
[PageIndex](https://github.com/VectifyAI/PageIndex) solves this with vectorless, reasoning-based retrieval — building a hierarchical tree index that lets LLMs reason over the index for context-aware retrieval.
154164

155-
By default, PageIndex runs locally using the open-source version, with no external dependencies required.
165+
PageIndex runs locally by default using the [open-source version](https://github.com/VectifyAI/PageIndex), with no external dependencies required.
156166

157167
#### Optional: Cloud Support
158168

@@ -183,27 +193,35 @@ OpenKB's wiki is a directory of Markdown files with `[[wikilinks]]`. Obsidian re
183193
3. Use graph view to see knowledge connections
184194
4. Use Obsidian Web Clipper to add web articles to `raw/`
185195

186-
# 🔗 Learn More
196+
# 🧭 Learn More
187197

188198
### Compared to Karpathy's Approach
189199

190200
| | Karpathy's workflow | OpenKB |
191201
|---|---|---|
192202
| Short documents | LLM reads directly | markitdown → LLM reads |
193-
| Long documents | Doesn't fit in context | PageIndex tree index |
203+
| Long documents | Context limits, context rot | PageIndex tree index |
194204
| Supported formats | Web clipper → .md | PDF, Word, PPT, Excel, HTML, text, CSV, .md |
195205
| Wiki compilation | LLM agent | LLM agent (same) |
196206
| Q&A | Query over wiki | Wiki + PageIndex retrieval |
197207

198208
### Tech Stack
199209

200-
- [PageIndex](https://github.com/VectifyAI/PageIndex) — Vectorless, reasoning-based document indexing
210+
- [PageIndex](https://github.com/VectifyAI/PageIndex) — Vectorless, reasoning-based document indexing and retrieval
201211
- [markitdown](https://github.com/microsoft/markitdown) — Universal file-to-markdown conversion
202212
- [OpenAI Agents SDK](https://github.com/openai/openai-agents-python) — Agent framework (supports non-OpenAI models via LiteLLM)
203213
- [LiteLLM](https://github.com/BerriAI/litellm) — Multi-provider LLM gateway
204214
- [Click](https://click.palletsprojects.com/) — CLI framework
205215
- [watchdog](https://github.com/gorakhargosh/watchdog) — Filesystem monitoring
206216

217+
### Roadmap
218+
219+
- [ ] Extend long document handling to non-PDF formats
220+
- [ ] Scale to large document collections with nested folder support
221+
- [ ] Hierarchical concept (topic) indexing for massive knowledge bases
222+
- [ ] Database-backed storage engine
223+
- [ ] Web UI for browsing and managing wikis
224+
207225
### Contributing
208226

209227
Contributions are welcome! Please submit a pull request, or open an [issue](https://github.com/VectifyAI/OpenKB/issues) for bugs or feature requests. For larger changes, consider opening an issue first to discuss the approach.
@@ -214,7 +232,7 @@ Apache 2.0. See [LICENSE](LICENSE).
214232

215233
### Support Us
216234

217-
Leave us a star 🌟 if you like our project. Thank you!
235+
If you find OpenKB useful, give us a star 🌟 — and check out [PageIndex](https://github.com/VectifyAI/PageIndex) too!
218236

219237
<div>
220238

config.yaml.example

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
11
model: gpt-5.4 # LLM model (any LiteLLM-supported provider)
2-
api_key_env: LLM_API_KEY # Environment variable for API key
3-
language: en # Wiki output language
4-
pageindex_threshold: 20 # PDF pages threshold for PageIndex
5-
pageindex_api_key_env: "" # Env var name for PageIndex Cloud API key
2+
language: en # Wiki output language
3+
pageindex_threshold: 20 # PDF pages threshold for PageIndex

openkb/agent/compiler.py

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def write_file(path: str, content: str) -> str:
108108
name="wiki-compiler",
109109
instructions=instructions,
110110
tools=[list_files, read_file, write_file],
111-
model=model,
111+
model=f"litellm/{model}",
112112
model_settings=ModelSettings(parallel_tool_calls=False),
113113
)
114114

@@ -118,7 +118,7 @@ def build_long_doc_compiler_agent(wiki_root: str, kb_dir: str, model: str, langu
118118
119119
Args:
120120
wiki_root: Absolute path to the wiki directory.
121-
kb_dir: Absolute path to the knowledge base root (contains .okb/).
121+
kb_dir: Absolute path to the knowledge base root (contains .openkb/).
122122
model: LLM model name to use for the agent.
123123
language: Language code for wiki content (e.g. 'en', 'fr').
124124
@@ -127,15 +127,14 @@ def build_long_doc_compiler_agent(wiki_root: str, kb_dir: str, model: str, langu
127127
"""
128128
from openkb.config import load_config
129129

130-
okb_dir = Path(kb_dir) / ".okb"
131-
config = load_config(okb_dir / "config.yaml")
130+
openkb_dir = Path(kb_dir) / ".openkb"
131+
config = load_config(openkb_dir / "config.yaml")
132132
_model = config.get("model", model)
133-
pi_key_env = config.get("pageindex_api_key_env", "") or "PAGEINDEX_API_KEY"
134-
pi_api_key = os.environ.get(pi_key_env, "")
133+
pageindex_api_key = os.environ.get("PAGEINDEX_API_KEY", "")
135134
client = PageIndexClient(
136-
api_key=pi_api_key or None,
135+
api_key=pageindex_api_key or None,
137136
model=_model,
138-
storage_path=str(okb_dir),
137+
storage_path=str(openkb_dir),
139138
)
140139
col = client.collection()
141140

@@ -195,7 +194,7 @@ def get_page_content(doc_id: str, pages: str) -> str:
195194
name="wiki-compiler",
196195
instructions=instructions,
197196
tools=[list_files, read_file, write_file, get_page_content],
198-
model=_model,
197+
model=f"litellm/{_model}",
199198
model_settings=ModelSettings(parallel_tool_calls=False),
200199
)
201200

@@ -214,13 +213,13 @@ async def compile_short_doc(
214213
Args:
215214
doc_name: Document stem name (no extension).
216215
source_path: Path to the converted Markdown in wiki/sources/.
217-
kb_dir: Root of the knowledge base (contains wiki/ and .okb/).
216+
kb_dir: Root of the knowledge base (contains wiki/ and .openkb/).
218217
model: LLM model name.
219218
"""
220219
from openkb.config import load_config
221220

222-
okb_dir = kb_dir / ".okb"
223-
config = load_config(okb_dir / "config.yaml")
221+
openkb_dir = kb_dir / ".openkb"
222+
config = load_config(openkb_dir / "config.yaml")
224223
language: str = config.get("language", "en")
225224

226225
wiki_root = str(kb_dir / "wiki")
@@ -257,8 +256,8 @@ async def compile_long_doc(
257256
"""
258257
from openkb.config import load_config
259258

260-
okb_dir = kb_dir / ".okb"
261-
config = load_config(okb_dir / "config.yaml")
259+
openkb_dir = kb_dir / ".openkb"
260+
config = load_config(openkb_dir / "config.yaml")
262261
language: str = config.get("language", "en")
263262

264263
wiki_root = str(kb_dir / "wiki")

openkb/agent/linter.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ def read_file(path: str) -> str:
7272
name="wiki-linter",
7373
instructions=instructions,
7474
tools=[list_files, read_file],
75-
model=model,
75+
model=f"litellm/{model}",
7676
)
7777

7878

@@ -88,8 +88,8 @@ async def run_knowledge_lint(kb_dir: Path, model: str) -> str:
8888
"""
8989
from openkb.config import load_config
9090

91-
okb_dir = kb_dir / ".okb"
92-
config = load_config(okb_dir / "config.yaml")
91+
openkb_dir = kb_dir / ".openkb"
92+
config = load_config(openkb_dir / "config.yaml")
9393
language: str = config.get("language", "en")
9494

9595
wiki_root = str(kb_dir / "wiki")

0 commit comments

Comments
 (0)