Complete reference for all Causantic configuration options.
Causantic can be configured through multiple sources, applied in this priority order:
- CLI flags (highest priority)
- Environment variables (
CAUSANTIC_*) - Project config (
./causantic.config.json) - User config (
~/.causantic/config.json) - Built-in defaults (lowest priority)
Causantic uses JSON configuration files. Create causantic.config.json:
{
"$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json"
}Controls HDBSCAN clustering behavior.
| Property | Type | Default | Description |
|---|---|---|---|
threshold |
number |
0.10 |
Angular distance for cluster assignment (0.01-0.5) |
minClusterSize |
integer |
4 |
Minimum points to form a cluster (2-100) |
incrementalThreshold |
number |
0.3 |
Ratio of new chunks (vs total at last full recluster) that triggers full recluster (0.01-1) |
Research finding: Threshold 0.10 achieves F1=0.940 (100% precision, 88.7% recall) on same-cluster pair prediction.
Controls chain walking behavior.
| Property | Type | Default | Description |
|---|---|---|---|
maxDepth |
integer |
50 |
Safety cap on chain walking depth (1-100) |
maxDepth limits the maximum chain depth during episodic recall/predict. The token budget is the primary stopping criterion; maxDepth is a safety net.
The chain walker also uses two internal limits (not currently exposed in config):
| Internal Option | Default | Description |
|---|---|---|
maxCandidatesPerSeed |
10 |
Cap on emitted candidate chains per seed |
maxExpansionsPerSeed |
200 |
Cap on DFS recursive calls per seed (bounds wall time) |
maxSkippedConsecutive |
5 |
Abandon branch after N consecutive agent-filtered skips |
With typical out-degree 1 (linear chains), a seed's DFS visits ~50 nodes. At branching points (out-degree 2-3), total expansions are ~60-100. The 200-expansion budget is generous for typical graphs and protective against rare dense subgraphs.
Controls output token budgets.
| Property | Type | Default | Description |
|---|---|---|---|
claudeMdBudget |
integer |
500 |
Tokens for CLAUDE.md memory section (100-10000) |
mcpMaxResponse |
integer |
20000 |
Maximum tokens in MCP responses (500-50000) |
Controls the hybrid BM25 + vector search pipeline. These settings are internal defaults and not currently exposed in causantic.config.json — they are configured programmatically via MemoryConfig.
| Property | Type | Default | Description |
|---|---|---|---|
rrfK |
integer |
60 |
RRF constant. Higher values reduce the impact of high-ranked items |
vectorWeight |
number |
1.0 |
Weight for vector search results in RRF fusion |
keywordWeight |
number |
1.0 |
Weight for keyword search results in RRF fusion |
keywordSearchLimit |
integer |
20 |
Maximum keyword results before fusion |
Controls cluster-guided expansion during retrieval. These settings are internal defaults and not currently exposed in causantic.config.json — they are configured programmatically via MemoryConfig.
| Property | Type | Default | Description |
|---|---|---|---|
maxClusters |
integer |
3 |
Maximum clusters to expand from per query |
maxSiblings |
integer |
5 |
Maximum sibling chunks added per cluster |
Controls the semantic index layer, which generates normalised index entries for improved search quality.
| Property | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
false |
Enable semantic index generation during ingestion |
targetDescriptionTokens |
integer |
130 |
Target token count for generated descriptions (50-500) |
batchRefreshLimit |
integer |
500 |
Maximum entries to backfill per maintenance run |
useForSearch |
boolean |
true |
Use index entries for search when available (falls back to chunk search if no entries exist) |
Environment variables:
| Setting | Environment Variable |
|---|---|
semanticIndex.enabled |
CAUSANTIC_SEMANTIC_INDEX_ENABLED |
semanticIndex.useForSearch |
CAUSANTIC_SEMANTIC_INDEX_USE_FOR_SEARCH |
When enabled, each chunk gets an LLM-generated description (~130 tokens) at ingestion time. These descriptions are embedded and searched instead of raw chunks, providing uniform information density. See How It Works for details.
Entity extraction runs automatically during ingestion with no configuration required. It uses deterministic regex patterns to identify people (@mentions, emails, "X said"), channels (#channel), meetings (standup, retro, 1:1), and URLs. Extracted entities are stored with alias resolution and used as an RRF boost source (weight 1.5) during search.
Entity extraction skips code blocks and [Thinking] blocks to reduce false positives. The feature is always-on with no configuration knobs — it adds zero latency to queries that don't contain entity references.
Controls logarithmic length penalty for large chunks in search results, preventing keyword-rich chunks from dominating.
| Property | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
true |
Enable length penalty in search scoring |
referenceTokens |
integer |
500 |
Reference token count for the logarithmic penalty. Chunks above this size receive diminishing scores. |
Controls time-decay scoring for search results.
| Property | Type | Default | Description |
|---|---|---|---|
decayFactor |
number |
0.3 |
Amplitude of the time-decay boost (0-1) |
halfLifeHours |
number |
48 |
Half-life in hours for the decay function |
Environment variables:
| Setting | Environment Variable |
|---|---|
recency.decayFactor |
CAUSANTIC_RECENCY_DECAY_FACTOR |
recency.halfLifeHours |
CAUSANTIC_RECENCY_HALF_LIFE_HOURS |
Controls the search retrieval pipeline.
| Property | Type | Default | Description |
|---|---|---|---|
primary |
string |
"hybrid" |
Primary retrieval method: "keyword", "vector", or "hybrid" (BM25 + vector + RRF) |
vectorEnrichment |
boolean |
false |
Use vector search to enrich keyword results when primary is "keyword". No effect in hybrid mode. |
mmrLambda |
number |
0.7 |
MMR (Maximal Marginal Relevance) lambda parameter (0-1) |
feedbackWeight |
number |
0.1 |
Weight applied to implicit relevance feedback signals (0-1) |
MMR reranks search results to balance relevance with diversity. After RRF fusion and cluster expansion, candidates are reordered so that semantically redundant chunks yield to novel ones.
1.0= pure relevance (no diversity, same as pre-MMR behaviour)0.7= default balance (first pick is always top relevance; subsequent picks trade off diminishing relevance for novelty)0.0= pure diversity (maximally spread results across topics)
MMR applies to both the search tool and the seed-finding stage of recall/predict. It only activates when there are 10+ candidates (below that, diversity is moot).
Controls data storage locations.
| Property | Type | Default | Description |
|---|---|---|---|
dbPath |
string |
"~/.causantic/memory.db" |
SQLite database path |
vectorPath |
string |
"~/.causantic/vectors" |
LanceDB vector store directory |
Paths starting with ~ expand to the user's home directory.
Controls vector storage lifecycle.
| Property | Type | Default | Description |
|---|---|---|---|
ttlDays |
integer |
90 |
Days since last access before vector expiry (1-3650) |
maxCount |
integer |
0 |
Maximum vectors to keep. 0 = unlimited. Oldest evicted first. |
Controls database encryption at rest.
| Property | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
false |
Enable database encryption |
cipher |
"chacha20" | "sqlcipher" |
"chacha20" |
Encryption cipher. ChaCha20-Poly1305 is 2-3x faster on ARM. |
keySource |
"keychain" | "env" | "prompt" |
"keychain" |
Where to get encryption key: OS secret store, CAUSANTIC_DB_KEY env var, or interactive prompt |
auditLog |
boolean |
false |
Log database access attempts to ~/.causantic/audit.log |
See Security Guide for encryption setup instructions.
Controls embedding model inference.
| Property | Type | Default | Description |
|---|---|---|---|
device |
"auto" | "coreml" | "cuda" | "cpu" | "wasm" |
"auto" |
Device for embedding inference. auto detects hardware capabilities (CoreML on Apple Silicon, CUDA on NVIDIA GPUs). |
model |
"jina-small" | "nomic-v1.5" | "jina-code" | "bge-small" |
"jina-small" |
Embedding model. Changing model requires running npx causantic reindex to re-embed all chunks. |
Controls the maintenance schedule.
| Property | Type | Default | Description |
|---|---|---|---|
clusterHour |
integer |
2 |
Hour of day (0-23) to run reclustering. Cleanup tasks run 1-1.5h after. |
Controls the structural codebase map.
| Property | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
true |
Enable repo map generation |
maxTokens |
integer |
1024 |
Maximum tokens for the repo map output (256-8192) |
languages |
string[] |
22 languages (see below) | Supported language identifiers for parsing |
Default languages: typescript, javascript, python, java, c, cpp, rust, go, ruby, c-sharp, php, bash, scala, kotlin, swift, haskell, lua, dart, zig, elixir, perl, r.
The first 12 languages use tree-sitter AST parsing for accurate definition/reference extraction. The remaining 10 use regex-based line matching as a fallback — less precise but covers the majority of definitions.
Controls optional LLM features.
| Property | Type | Default | Description |
|---|---|---|---|
clusterRefreshModel |
string |
"claude-3-haiku-20240307" |
Model for cluster descriptions |
refreshRateLimitPerMin |
integer |
30 |
Rate limit for LLM calls (1-1000) |
enableLabelling |
boolean |
true |
Enable LLM-based cluster labelling. Requires Anthropic API key. Set to false for fully local usage. |
Note: LLM features are optional. Causantic works without an Anthropic API key.
All settings can be overridden via environment variables:
| Setting | Environment Variable |
|---|---|
clustering.threshold |
CAUSANTIC_CLUSTERING_THRESHOLD |
clustering.minClusterSize |
CAUSANTIC_CLUSTERING_MIN_CLUSTER_SIZE |
clustering.incrementalThreshold |
CAUSANTIC_CLUSTERING_INCREMENTAL_THRESHOLD |
traversal.maxDepth |
CAUSANTIC_TRAVERSAL_MAX_DEPTH |
tokens.claudeMdBudget |
CAUSANTIC_TOKENS_CLAUDE_MD_BUDGET |
tokens.mcpMaxResponse |
CAUSANTIC_TOKENS_MCP_MAX_RESPONSE |
storage.dbPath |
CAUSANTIC_STORAGE_DB_PATH |
storage.vectorPath |
CAUSANTIC_STORAGE_VECTOR_PATH |
vectors.ttlDays |
CAUSANTIC_VECTORS_TTL_DAYS |
vectors.maxCount |
CAUSANTIC_VECTORS_MAX_COUNT |
llm.clusterRefreshModel |
CAUSANTIC_LLM_CLUSTER_REFRESH_MODEL |
llm.refreshRateLimitPerMin |
CAUSANTIC_LLM_REFRESH_RATE_LIMIT |
llm.enableLabelling |
CAUSANTIC_LLM_ENABLE_LABELLING |
encryption.enabled |
CAUSANTIC_ENCRYPTION_ENABLED |
encryption.cipher |
CAUSANTIC_ENCRYPTION_CIPHER |
encryption.keySource |
CAUSANTIC_ENCRYPTION_KEY_SOURCE |
encryption.auditLog |
CAUSANTIC_ENCRYPTION_AUDIT_LOG |
embedding.device |
CAUSANTIC_EMBEDDING_DEVICE |
embedding.model |
CAUSANTIC_EMBEDDING_MODEL |
maintenance.clusterHour |
CAUSANTIC_MAINTENANCE_CLUSTER_HOUR |
retrieval.mmrLambda |
CAUSANTIC_RETRIEVAL_MMR_LAMBDA |
retrieval.feedbackWeight |
CAUSANTIC_RETRIEVAL_FEEDBACK_WEIGHT |
retrieval.primary |
CAUSANTIC_RETRIEVAL_PRIMARY |
retrieval.vectorEnrichment |
CAUSANTIC_RETRIEVAL_VECTOR_ENRICHMENT |
embedding.eager |
CAUSANTIC_EMBEDDING_EAGER |
semanticIndex.enabled |
CAUSANTIC_SEMANTIC_INDEX_ENABLED |
semanticIndex.useForSearch |
CAUSANTIC_SEMANTIC_INDEX_USE_FOR_SEARCH |
repomap.enabled |
CAUSANTIC_REPOMAP_ENABLED |
repomap.maxTokens |
CAUSANTIC_REPOMAP_MAX_TOKENS |
{
"$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json"
}Uses all defaults.
{
"$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
"traversal": {
"maxDepth": 100
}
}Increases the chain walking depth limit for collections with very long session histories.
{
"$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
"tokens": {
"claudeMdBudget": 1000,
"mcpMaxResponse": 5000
}
}Increases token budgets for richer context.
{
"$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
"retrieval": {
"mmrLambda": 0.5
}
}Lowers the MMR lambda to favour diversity over relevance. Useful when search results are dominated by near-duplicate hits from the same session.
{
"$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
"encryption": {
"enabled": true,
"cipher": "chacha20",
"keySource": "keychain"
}
}Enables ChaCha20-Poly1305 encryption with the key stored in the OS keychain.