Configuration Reference

Complete reference for all Causantic configuration options.

Configuration Priority

Causantic can be configured through multiple sources, applied in this priority order:

CLI flags (highest priority)
Environment variables (CAUSANTIC_*)
Project config (./causantic.config.json)
User config (~/.causantic/config.json)
Built-in defaults (lowest priority)

Configuration File

Causantic uses JSON configuration files. Create causantic.config.json:

{
  "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json"
}

Clustering Settings

`clustering`

Controls HDBSCAN clustering behavior.

Property	Type	Default	Description
`threshold`	`number`	`0.10`	Angular distance for cluster assignment (0.01-0.5)
`minClusterSize`	`integer`	`4`	Minimum points to form a cluster (2-100)
`incrementalThreshold`	`number`	`0.3`	Ratio of new chunks (vs total at last full recluster) that triggers full recluster (0.01-1)

Research finding: Threshold 0.10 achieves F1=0.940 (100% precision, 88.7% recall) on same-cluster pair prediction.

Traversal Settings

`traversal`

Controls chain walking behavior.

Property	Type	Default	Description
`maxDepth`	`integer`	`50`	Safety cap on chain walking depth (1-100)

maxDepth limits the maximum chain depth during episodic recall/predict. The token budget is the primary stopping criterion; maxDepth is a safety net.

The chain walker also uses two internal limits (not currently exposed in config):

Internal Option	Default	Description
`maxCandidatesPerSeed`	`10`	Cap on emitted candidate chains per seed
`maxExpansionsPerSeed`	`200`	Cap on DFS recursive calls per seed (bounds wall time)
`maxSkippedConsecutive`	`5`	Abandon branch after N consecutive agent-filtered skips

With typical out-degree 1 (linear chains), a seed's DFS visits ~50 nodes. At branching points (out-degree 2-3), total expansions are ~60-100. The 200-expansion budget is generous for typical graphs and protective against rare dense subgraphs.

Token Settings

`tokens`

Controls output token budgets.

Property	Type	Default	Description
`claudeMdBudget`	`integer`	`500`	Tokens for CLAUDE.md memory section (100-10000)
`mcpMaxResponse`	`integer`	`20000`	Maximum tokens in MCP responses (500-50000)

Hybrid Search Settings

`hybridSearch`

Controls the hybrid BM25 + vector search pipeline. These settings are internal defaults and not currently exposed in causantic.config.json — they are configured programmatically via MemoryConfig.

Property	Type	Default	Description
`rrfK`	`integer`	`60`	RRF constant. Higher values reduce the impact of high-ranked items
`vectorWeight`	`number`	`1.0`	Weight for vector search results in RRF fusion
`keywordWeight`	`number`	`1.0`	Weight for keyword search results in RRF fusion
`keywordSearchLimit`	`integer`	`20`	Maximum keyword results before fusion

`clusterExpansion`

Controls cluster-guided expansion during retrieval. These settings are internal defaults and not currently exposed in causantic.config.json — they are configured programmatically via MemoryConfig.

Property	Type	Default	Description
`maxClusters`	`integer`	`3`	Maximum clusters to expand from per query
`maxSiblings`	`integer`	`5`	Maximum sibling chunks added per cluster

Semantic Index Settings

`semanticIndex`

Controls the semantic index layer, which generates normalised index entries for improved search quality.

Property	Type	Default	Description
`enabled`	`boolean`	`false`	Enable semantic index generation during ingestion
`targetDescriptionTokens`	`integer`	`130`	Target token count for generated descriptions (50-500)
`batchRefreshLimit`	`integer`	`500`	Maximum entries to backfill per maintenance run
`useForSearch`	`boolean`	`true`	Use index entries for search when available (falls back to chunk search if no entries exist)

Environment variables:

Setting	Environment Variable
`semanticIndex.enabled`	`CAUSANTIC_SEMANTIC_INDEX_ENABLED`
`semanticIndex.useForSearch`	`CAUSANTIC_SEMANTIC_INDEX_USE_FOR_SEARCH`

When enabled, each chunk gets an LLM-generated description (~130 tokens) at ingestion time. These descriptions are embedded and searched instead of raw chunks, providing uniform information density. See How It Works for details.

Entity Extraction

Entity extraction runs automatically during ingestion with no configuration required. It uses deterministic regex patterns to identify people (@mentions, emails, "X said"), channels (#channel), meetings (standup, retro, 1:1), and URLs. Extracted entities are stored with alias resolution and used as an RRF boost source (weight 1.5) during search.

Entity extraction skips code blocks and [Thinking] blocks to reduce false positives. The feature is always-on with no configuration knobs — it adds zero latency to queries that don't contain entity references.

Length Penalty Settings

`lengthPenalty`

Controls logarithmic length penalty for large chunks in search results, preventing keyword-rich chunks from dominating.

Property	Type	Default	Description
`enabled`	`boolean`	`true`	Enable length penalty in search scoring
`referenceTokens`	`integer`	`500`	Reference token count for the logarithmic penalty. Chunks above this size receive diminishing scores.

Recency Settings

`recency`

Controls time-decay scoring for search results.

Property	Type	Default	Description
`decayFactor`	`number`	`0.3`	Amplitude of the time-decay boost (0-1)
`halfLifeHours`	`number`	`48`	Half-life in hours for the decay function

Environment variables:

Setting	Environment Variable
`recency.decayFactor`	`CAUSANTIC_RECENCY_DECAY_FACTOR`
`recency.halfLifeHours`	`CAUSANTIC_RECENCY_HALF_LIFE_HOURS`

Retrieval Settings

`retrieval`

Controls the search retrieval pipeline.

Property	Type	Default	Description
`primary`	`string`	`"hybrid"`	Primary retrieval method: `"keyword"`, `"vector"`, or `"hybrid"` (BM25 + vector + RRF)
`vectorEnrichment`	`boolean`	`false`	Use vector search to enrich keyword results when primary is `"keyword"`. No effect in hybrid mode.
`mmrLambda`	`number`	`0.7`	MMR (Maximal Marginal Relevance) lambda parameter (0-1)
`feedbackWeight`	`number`	`0.1`	Weight applied to implicit relevance feedback signals (0-1)

MMR reranks search results to balance relevance with diversity. After RRF fusion and cluster expansion, candidates are reordered so that semantically redundant chunks yield to novel ones.

1.0 = pure relevance (no diversity, same as pre-MMR behaviour)
0.7 = default balance (first pick is always top relevance; subsequent picks trade off diminishing relevance for novelty)
0.0 = pure diversity (maximally spread results across topics)

MMR applies to both the search tool and the seed-finding stage of recall/predict. It only activates when there are 10+ candidates (below that, diversity is moot).

Storage Settings

`storage`

Controls data storage locations.

Property	Type	Default	Description
`dbPath`	`string`	`"~/.causantic/memory.db"`	SQLite database path
`vectorPath`	`string`	`"~/.causantic/vectors"`	LanceDB vector store directory

Paths starting with ~ expand to the user's home directory.

`vectors`

Controls vector storage lifecycle.

Property	Type	Default	Description
`ttlDays`	`integer`	`90`	Days since last access before vector expiry (1-3650)
`maxCount`	`integer`	`0`	Maximum vectors to keep. 0 = unlimited. Oldest evicted first.

Encryption Settings

`encryption`

Controls database encryption at rest.

Property	Type	Default	Description
`enabled`	`boolean`	`false`	Enable database encryption
`cipher`	`"chacha20"` \| `"sqlcipher"`	`"chacha20"`	Encryption cipher. ChaCha20-Poly1305 is 2-3x faster on ARM.
`keySource`	`"keychain"` \| `"env"` \| `"prompt"`	`"keychain"`	Where to get encryption key: OS secret store, `CAUSANTIC_DB_KEY` env var, or interactive prompt
`auditLog`	`boolean`	`false`	Log database access attempts to `~/.causantic/audit.log`

See Security Guide for encryption setup instructions.

Embedding Settings

`embedding`

Controls embedding model inference.

Property	Type	Default	Description
`device`	`"auto"` \| `"coreml"` \| `"cuda"` \| `"cpu"` \| `"wasm"`	`"auto"`	Device for embedding inference. `auto` detects hardware capabilities (CoreML on Apple Silicon, CUDA on NVIDIA GPUs).
`model`	`"jina-small"` \| `"nomic-v1.5"` \| `"jina-code"` \| `"bge-small"`	`"jina-small"`	Embedding model. Changing model requires running `npx causantic reindex` to re-embed all chunks.

Maintenance Settings

`maintenance`

Controls the maintenance schedule.

Property	Type	Default	Description
`clusterHour`	`integer`	`2`	Hour of day (0-23) to run reclustering. Cleanup tasks run 1-1.5h after.

Repo Map Settings

`repomap`

Controls the structural codebase map.

Property	Type	Default	Description
`enabled`	`boolean`	`true`	Enable repo map generation
`maxTokens`	`integer`	`1024`	Maximum tokens for the repo map output (256-8192)
`languages`	`string[]`	22 languages (see below)	Supported language identifiers for parsing

Default languages: typescript, javascript, python, java, c, cpp, rust, go, ruby, c-sharp, php, bash, scala, kotlin, swift, haskell, lua, dart, zig, elixir, perl, r.

The first 12 languages use tree-sitter AST parsing for accurate definition/reference extraction. The remaining 10 use regex-based line matching as a fallback — less precise but covers the majority of definitions.

LLM Settings

`llm`

Controls optional LLM features.

Property	Type	Default	Description
`clusterRefreshModel`	`string`	`"claude-3-haiku-20240307"`	Model for cluster descriptions
`refreshRateLimitPerMin`	`integer`	`30`	Rate limit for LLM calls (1-1000)
`enableLabelling`	`boolean`	`true`	Enable LLM-based cluster labelling. Requires Anthropic API key. Set to `false` for fully local usage.

Note: LLM features are optional. Causantic works without an Anthropic API key.

Environment Variables

All settings can be overridden via environment variables:

Setting	Environment Variable
`clustering.threshold`	`CAUSANTIC_CLUSTERING_THRESHOLD`
`clustering.minClusterSize`	`CAUSANTIC_CLUSTERING_MIN_CLUSTER_SIZE`
`clustering.incrementalThreshold`	`CAUSANTIC_CLUSTERING_INCREMENTAL_THRESHOLD`
`traversal.maxDepth`	`CAUSANTIC_TRAVERSAL_MAX_DEPTH`
`tokens.claudeMdBudget`	`CAUSANTIC_TOKENS_CLAUDE_MD_BUDGET`
`tokens.mcpMaxResponse`	`CAUSANTIC_TOKENS_MCP_MAX_RESPONSE`
`storage.dbPath`	`CAUSANTIC_STORAGE_DB_PATH`
`storage.vectorPath`	`CAUSANTIC_STORAGE_VECTOR_PATH`
`vectors.ttlDays`	`CAUSANTIC_VECTORS_TTL_DAYS`
`vectors.maxCount`	`CAUSANTIC_VECTORS_MAX_COUNT`
`llm.clusterRefreshModel`	`CAUSANTIC_LLM_CLUSTER_REFRESH_MODEL`
`llm.refreshRateLimitPerMin`	`CAUSANTIC_LLM_REFRESH_RATE_LIMIT`
`llm.enableLabelling`	`CAUSANTIC_LLM_ENABLE_LABELLING`
`encryption.enabled`	`CAUSANTIC_ENCRYPTION_ENABLED`
`encryption.cipher`	`CAUSANTIC_ENCRYPTION_CIPHER`
`encryption.keySource`	`CAUSANTIC_ENCRYPTION_KEY_SOURCE`
`encryption.auditLog`	`CAUSANTIC_ENCRYPTION_AUDIT_LOG`
`embedding.device`	`CAUSANTIC_EMBEDDING_DEVICE`
`embedding.model`	`CAUSANTIC_EMBEDDING_MODEL`
`maintenance.clusterHour`	`CAUSANTIC_MAINTENANCE_CLUSTER_HOUR`
`retrieval.mmrLambda`	`CAUSANTIC_RETRIEVAL_MMR_LAMBDA`
`retrieval.feedbackWeight`	`CAUSANTIC_RETRIEVAL_FEEDBACK_WEIGHT`
`retrieval.primary`	`CAUSANTIC_RETRIEVAL_PRIMARY`
`retrieval.vectorEnrichment`	`CAUSANTIC_RETRIEVAL_VECTOR_ENRICHMENT`
`embedding.eager`	`CAUSANTIC_EMBEDDING_EAGER`
`semanticIndex.enabled`	`CAUSANTIC_SEMANTIC_INDEX_ENABLED`
`semanticIndex.useForSearch`	`CAUSANTIC_SEMANTIC_INDEX_USE_FOR_SEARCH`
`repomap.enabled`	`CAUSANTIC_REPOMAP_ENABLED`
`repomap.maxTokens`	`CAUSANTIC_REPOMAP_MAX_TOKENS`

Example Configurations

Minimal

{
  "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json"
}

Uses all defaults.

Deep Chain Walking

{
  "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
  "traversal": {
    "maxDepth": 100
  }
}

Increases the chain walking depth limit for collections with very long session histories.

Large Context Budget

{
  "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
  "tokens": {
    "claudeMdBudget": 1000,
    "mcpMaxResponse": 5000
  }
}

Increases token budgets for richer context.

More Diverse Search Results

{
  "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
  "retrieval": {
    "mmrLambda": 0.5
  }
}

Lowers the MMR lambda to favour diversity over relevance. Useful when search results are dominated by near-duplicate hits from the same session.

Encrypted Database

{
  "$schema": "https://raw.githubusercontent.com/Entrolution/causantic/main/config.schema.json",
  "encryption": {
    "enabled": true,
    "cipher": "chacha20",
    "keySource": "keychain"
  }
}

Enables ChaCha20-Poly1305 encryption with the key stored in the OS keychain.

Uh oh!

FilesExpand file tree

configuration.md

Latest commit

History