You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,30 @@ All notable changes to this project will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [0.10.1] - 2026-03-13
9
+
10
+
### Added
11
+
12
+
-**Entity extraction** (`src/ingest/entity-extractor.ts`): Deterministic regex-based extraction of people (@mentions, emails, "X said" patterns), channels (#channels), meetings (standup, retro, 1:1), and URLs from chunk content. Skips code blocks and `[Thinking]` blocks to reduce noise.
13
+
-**Entity store** (`src/storage/entity-store.ts`): CRUD layer for entities, aliases, and chunk mentions. Supports alias resolution, re-ingestion safety (INSERT OR IGNORE), and per-entity chunk lookup capped at 100 most recent.
14
+
-**Entity-aware retrieval**: Entity mentions in queries are matched against stored entities and injected as an RRF source (weight 1.5) in both keyword and hybrid search paths. Project-scoped — gracefully skips when no project filter is provided.
15
+
-**Entity tables** (migration v16): Three new tables (`entities`, `entity_aliases`, `entity_mentions`) with cascade deletes and appropriate indexes.
16
+
-**Entity count in stats**: The `stats` MCP tool now reports entity count.
17
+
18
+
### Changed
19
+
20
+
-**Hybrid retrieval default**: `retrieval.primary` changed from `'keyword'` to `'hybrid'`. Vector search is now always active at ~14ms cost (local jina-small), which covers narrative/thematic projects without per-project configuration. Backward compatible — `retrieval.primary: 'keyword'` in config still works.
21
+
-**Temporal misrouting fix**: Updated `search` and `recall` tool descriptions to redirect recent/latest session queries to `reconstruct`. Updated `reconstruct` description to explicitly claim temporal queries.
22
+
-**MCP tool descriptions**: `search` now mentions hybrid retrieval and entity boosting; `recall` and `reconstruct` include temporal routing guidance.
23
+
24
+
### Fixed
25
+
26
+
-**MCP integration test timeout**: Increased `beforeAll` hook timeout from 10s to 30s to accommodate heavy module imports (ONNX runtime, tree-sitter, LanceDB).
-**Meetings**: Keywords like standup, retro, 1:1, sync
218
+
-**URLs**: Full URL patterns
219
+
220
+
Entities are resolved to canonical forms with alias tracking (e.g., `@joel` and `Joel` map to the same entity). At query time, if the search query contains recognisable entity references, matching chunks are injected as an additional RRF source with a 1.5x boost weight. This means searching for "@joel" surfaces all chunks mentioning Joel alongside semantically relevant results, without requiring exact keyword matches in every chunk.
221
+
222
+
Entity extraction skips code blocks and `[Thinking]` blocks to avoid false positives from speculative content.
223
+
211
224
### Recall/Predict (episodic)
212
225
213
226
The `recall` and `predict` tools reconstruct narrative chains:
Copy file name to clipboardExpand all lines: docs/reference/configuration.md
+12-4Lines changed: 12 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,6 +103,12 @@ Controls the semantic index layer, which generates normalised index entries for
103
103
104
104
When enabled, each chunk gets an LLM-generated description (~130 tokens) at ingestion time. These descriptions are embedded and searched instead of raw chunks, providing uniform information density. See [How It Works](../guides/how-it-works.md#semantic-index) for details.
105
105
106
+
## Entity Extraction
107
+
108
+
Entity extraction runs automatically during ingestion with no configuration required. It uses deterministic regex patterns to identify people (`@mentions`, emails, "X said"), channels (`#channel`), meetings (standup, retro, 1:1), and URLs. Extracted entities are stored with alias resolution and used as an RRF boost source (weight 1.5) during search.
109
+
110
+
Entity extraction skips code blocks and `[Thinking]` blocks to reduce false positives. The feature is always-on with no configuration knobs — it adds zero latency to queries that don't contain entity references.
111
+
106
112
## Length Penalty Settings
107
113
108
114
### `lengthPenalty`
@@ -138,10 +144,12 @@ Controls time-decay scoring for search results.
|`feedbackWeight`|`number`|`0.1`| Weight applied to implicit relevance feedback signals (0-1) |
145
153
146
154
MMR reranks search results to balance relevance with diversity. After RRF fusion and cluster expansion, candidates are reordered so that semantically redundant chunks yield to novel ones.
Copy file name to clipboardExpand all lines: docs/reference/mcp-tools.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ All tools return plain text responses via the MCP `content` array with `type: "t
16
16
17
17
### search
18
18
19
-
Search memory semantically to discover relevant past context. Returns ranked results using hybrid BM25 + vector search with RRF fusion, cluster expansion, and MMR diversity reranking.
19
+
Search memory to discover relevant past context. Uses hybrid (BM25 + vector) retrieval with entity boosting. Returns ranked results by relevance. For recent/latest session queries, use `reconstruct` instead.
Recall episodic memory by walking backward through causal chains to reconstruct narrative context. Seeds are found by semantic search; the causal graph unfolds them into ordered chains; chains are ranked by aggregate semantic relevance per token. Falls back to search results when no viable chain is found.
41
+
Recall episodic memory by walking backward through causal chains to reconstruct narrative context. Seeds are found by semantic search; the causal graph unfolds them into ordered chains; chains are ranked by aggregate semantic relevance per token. Falls back to search results when no viable chain is found. For recent/latest session queries, use `reconstruct` instead.
42
42
43
43
**Parameters**:
44
44
@@ -158,7 +158,7 @@ Returns `"No sessions found for project "[name]"."` if none match.
158
158
159
159
### reconstruct
160
160
161
-
Rebuild session context for a project. Call with just `project` to get the most recent history up to the token budget (timeline mode). Optionally specify a time range with `from`/`to`, `days_back`, `session_id`, or `previous_session`.
161
+
Use this for all recent/latest/last session queries. Rebuild session context for a project. Call with just `project` to get the most recent history up to the token budget (timeline mode). Optionally specify a time range with `from`/`to`, `days_back`, `session_id`, or `previous_session`.
162
162
163
163
**Parameters**:
164
164
@@ -200,12 +200,13 @@ Show memory statistics including version, chunk/edge/cluster counts, and per-pro
Copy file name to clipboardExpand all lines: src/cli/skill-templates.ts
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -101,8 +101,8 @@ Pass these to the \`search\` MCP tool:
101
101
## Guidelines
102
102
103
103
- **Always pass the \`project\` parameter** scoped to the current project (derive from the working directory) unless the user explicitly asks for cross-project results
104
-
- By default, search uses **keyword-first (BM25)** retrieval — great for exact matches on function names, error codes, and specific terms
105
-
- Optional vector enrichment can be enabled in config for semantic similarity matching
104
+
- By default, search uses **hybrid (BM25 + vector)** retrieval with entity boosting — combines exact keyword matching with semantic similarity
105
+
- For recent/latest session queries, use \`reconstruct\` instead
106
106
- Use \`search\` for discovery, \`recall\` for narrative reconstruction
107
107
- Combine with \`/causantic-recall\` when you need causal chain context (how things led to outcomes)
0 commit comments