You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-1Lines changed: 23 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,29 @@ wiki/
107
107
108
108
Short docs are read in full by the LLM. Long PDFs are indexed by PageIndex into a hierarchical tree with summaries — the LLM reads the tree instead of the full text, avoiding context window limits while retaining structural understanding.
109
109
110
-
> **⚡ PageIndex Cloud API** — By default, PageIndex runs locally. Set `PAGEINDEX_API_KEY` in your `.env` to use [PageIndex Cloud](https://pageindex.ai/) for faster indexing. Get an API key at [pageindex.dev](https://pageindex.dev).
110
+
111
+
# PageIndex integration
112
+
For long documents, relying solely on summaries often leads to information loss.
113
+
We integrate [PageIndex](https://github.com/VectifyAI/PageIndex) into the knowledge base to provide structured, context-aware retrieval for long documents—avoiding the information loss common in summary-based approaches.
114
+
115
+
By default, PageIndex runs locally using the open-source version, with no external dependencies required.
116
+
117
+
### Optional: Cloud Support
118
+
119
+
For large or complex PDFs, [PageIndex Cloud](https://docs.pageindex.ai/) can be used to access additional capabilities, including:
120
+
121
+
- OCR support for scanned PDFs (via hosted VLM models)
122
+
- Faster structure generation
123
+
- Scalable indexing for large documents
124
+
125
+
126
+
Set `PAGEINDEX_API_KEY` in your `.env` to enable cloud features:
0 commit comments