AI-powered commit message generator using tree-sitter semantic analysis and local LLMs.
cargo build --release
./target/release/commitbeePipeline: git diff → tree-sitter parse → symbol extraction + structural diffing → context building (budget, evidence, connections, imports, intents) → LLM prompt → sanitize → validate+retry → commit
- Hybrid Git: gix for repo discovery, git CLI for diffs (documented choice)
- Tree-sitter: Full file parsing with hunk mapping (not just +/- lines)
- Parallelism: rayon for CPU-bound tree-sitter parsing, tokio JoinSet for concurrent git content fetching
- LLM: Ollama primary (qwen3.5:4b), OpenAI/Anthropic secondary
- Streaming: Line-buffered JSON parsing with CancellationToken, 1 MB response cap (
MAX_RESPONSE_BYTES)
- Full file parsing - Parse staged/HEAD blobs, map diff hunks to symbol spans
- Token budget - 24K char limit (~6K tokens), prioritizes diff over symbols
- TTY detection - Safe for git hooks (graceful non-interactive fallback)
- Commit sanitizer - Validates LLM output (JSON + plain text), emits
BREAKING CHANGE:footer regardless ofinclude_body - Structured JSON output - Prompt requests JSON for reliable parsing; schema includes
breaking_change: Option<String>field - System prompt - Single
SYSTEM_PROMPTinllm/mod.rs, shared by all providers. Type list synced withCommitType::ALL, 72-char subject limit. - Simplified user prompt - Concise format optimized for <4B parameter models
- Commit splitting - Detects multi-concern changes, suggests splitting into separate commits
- Body line wrapping - Sanitizer wraps body text at 72 characters
- Signature extraction - Two-strategy:
child_by_field_name("body")primary,BODY_NODE_KINDSfallback, first-line final fallback. 200-char cap withfloor_char_boundary. No.scmquery changes needed. - Semantic change classification - Modified symbols classified via character-stream comparison (not bag-of-lines).
build()restructured: classify → infer_commit_type → format. - Cross-file connections -
detect_connectionsscans added diff lines forsym_name(patterns. Min 4-char name filter, capped at 5, sort+dedup. - Parent scope extraction -
extract_parent_scopewalks up AST through intermediate nodes (declaration_list, class_body) to find impl/class/trait. 7 languages. - Structural AST diffs -
AstDiffercompares old/new tree-sitter nodes for modified symbols. Returns ownedSymbolDiff(no Node lifetime leaks). Runs insideextract_for_file()while both Trees alive. - Change intent detection -
detect_intentsscans diff lines for error handling, test, logging, dependency patterns. Threshold >2 matches. Conservative type refinement (only overrides forperf). - Doc-vs-code classification -
SpanChangeKindenum (WhitespaceOnly, DocsOnly, Mixed, Semantic). Doc-only symbols suggestdocstype.is_doc_comment()uses line-prefix heuristic. - Adaptive token budget - Symbol budget 20% with structural diffs, 30% with signatures only, 20% base.
commitbee # Generate commit message (interactive)
commitbee --dry-run # Print message only, don't commit
commitbee --yes # Auto-confirm and commit
commitbee -n 3 # Generate 3 candidates, pick interactively
commitbee --verbose # Show symbol extraction details
commitbee --show-prompt # Debug: show the LLM prompt
commitbee --no-split # Disable commit split suggestions
commitbee --no-scope # Disable scope in commit messages
commitbee --clipboard # Copy message to clipboard (no commit)
commitbee --exclude "*.lock" # Exclude files matching glob pattern
commitbee --locale de # Generate message in German (type/scope stay English)
commitbee init # Create config file
commitbee config # Show current configuration
commitbee doctor # Check configuration and connectivity
commitbee completions bash # Generate shell completions
commitbee hook install # Install prepare-commit-msg hook
commitbee hook uninstall # Remove prepare-commit-msg hook
commitbee hook status # Check if hook is installed- PRD & Roadmap:
PRD.md - Implementation plans:
.claude/plans/(gitignored, local only) - Hunk-level splitting discussion: GitHub Discussion #2
| Skill | Invocation | Purpose |
|---|---|---|
ci-check |
/ci-check [fast|full|test <name>] |
Run fmt + clippy + tests + audit |
reuse-annotate |
/reuse-annotate <file> |
Add SPDX headers to new files |
| Agent | File | Purpose |
|---|---|---|
rust-security-reviewer |
.claude/agents/rust-security-reviewer.md |
Read-only security audit (8-category) |
cargo-dep-auditor |
.claude/agents/cargo-dep-auditor.md |
Check deps for outdated versions, yanked crates, advisories |
api-compat-reviewer |
.claude/agents/api-compat-reviewer.md |
Check public API changes for breaking callers/impls |
llm-prompt-quality-reviewer |
.claude/agents/llm-prompt-quality-reviewer.md |
Audit SYSTEM_PROMPT, schemas, CommitType sync, spec compliance |
| Hook | Trigger | Action |
|---|---|---|
rust-fmt.sh |
PostToolUse Edit/Write | rustfmt <file> on .rs files |
block-generated-files.sh |
PreToolUse Edit/Write | Block manual edits to Cargo.lock |
superpowers-check.sh |
SessionStart | Warn if superpowers plugin missing |
- Rust edition 2024, MSRV 1.94
- License: AGPL-3.0-only OR LicenseRef-Commercial (dual-license, REUSE compliant)
- Dev deps:
tempfile,assert_cmd,predicates,wiremock,insta,proptest,toml
- All files use
reuse annotateformat: blank comment separator between SPDX lines reuse lint— verify compliancereuse annotate --copyright "Sephyi <me@sephy.io>" --license "AGPL-3.0-only OR LicenseRef-Commercial" --year 2026 <file>— add header- REUSE.toml
[[annotations]]— for files that can't have inline headers (Cargo.lock, tests/snapshots/**)
cargo test # All tests (424 tests)
cargo test --test sanitizer # CommitSanitizer tests
cargo test --test safety # Safety module tests
cargo test --test context # ContextBuilder tests
cargo test --test commit_type # CommitType tests
cargo test --test integration # LLM provider integration tests (wiremock)
cargo test --test languages # Language-specific tree-sitter tests
cargo test --test history # Commit history style learning tests
cargo test --test template # Custom prompt template tests
cargo test -- --nocapture # Show println outputImportant: cargo test sanitizer matches test names across all binaries. Use cargo test --test <name> to select a specific integration test file.
- Async tests:
#[tokio::test](not#[test]with.block_on()) - Snapshots: after changing output, run
cargo insta reviewto accept/reject - Snapshot env:
UPDATE_EXPECT=1 cargo testfor bulk snapshot update - Wiremock: NDJSON streaming mocks use
respond_with(ResponseTemplate::new(200).set_body_raw(...))with\n-delimited JSON - Git fixtures:
tempfile::TempDir+git initviastd::process::Command, not real repos - Proptest:
PROPTEST_CASES=1000for thorough local runs before push
cargo build --release # Optimized binary
cargo check # Fast syntax check
cargo clippy --all-targets -- -D warnings # Lint (CI requires zero warnings)
cargo fmt # Format codeBefore pushing, run the full CI check locally:
cargo fmt --check && cargo clippy --all-targets -- -D warnings && cargo test --all-targets# Stage a change
git add some-file.rs
# Preview commit message
./target/release/commitbee --dry-run
# With verbose output
./target/release/commitbee --dry-run --verbose
# Debug the prompt
./target/release/commitbee --dry-run --show-prompt
# Auto-commit
./target/release/commitbee --yes
# Test commit message generation with debug logging (shows validation retries)
COMMITBEE_LOG=debug ./target/release/commitbee --dry-runWhen adding or updating crates:
- Verify latest stable version via
cargo search <crate> --limit 1before adding toCargo.toml - If a pre-release version is detected or would be added: STOP and ask the user — report the pre-release version found, the latest stable version (if any exists), and whether no stable release is available yet. Do not add a pre-release version without explicit user approval.
- Prefer
x.y(minor-compatible) over=x.y.z(exact pin) unless a bug requires it - Run
cargo auditbefore and after adding new dependencies - Use
cargo-dep-auditoragent for full pre-release dependency review
When adding or modifying LLM providers (src/services/llm/), every provider must:
new()returnsResult<Self>— propagate HTTP client build errors, neverunwrap_or_default()- Import and check
MAX_RESPONSE_BYTES— capfull_response.len()inside the streaming loop to prevent unbounded memory growth - Error body propagation — use
unwrap_or_else(|e| format!("(failed to read body: {e})"))on error response body reads, notunwrap_or_default() - EOF buffer parsing — after the byte stream ends, parse any remaining content in
line_buffer(SSE streams may deliver the final frame without a trailing newline) - Zero-allocation streaming — parse from
&line_buffer[..newline_pos]slices, thendrain(..=newline_pos)instead of allocating new Strings per line - Shared system prompt — use
super::SYSTEM_PROMPT, never duplicate prompt text - CancellationToken — check in
tokio::select!loop alongside stream chunks - SecretString for API keys — store as
secrecy::SecretString, expose only via.expose_secret()at HTTP header insertion. Never log, Debug, or Display the raw key.
Follow Conventional Commits strictly — the type must reflect what actually happened:
fix: Corrects incorrect behavior (a bug existed, now it doesn't)feat: Adds a new capability or safeguard that didn't exist before (even defensive checks)refactor: Improves code without changing behavior (better error messages, code quality, documentation)perf: Measurable performance improvement
Common mistake: calling a new safeguard/check fix — if there was no bug, it's feat. Improving error message quality without changing control flow is refactor, not fix.
gixAPI: userepo.workdir()notrepo.work_dir()(deprecated)CommitType::parse()notfrom_str()— avoids clippyshould_implement_traitwarning- Enum variants used only via
CommitType::ALLconst need#[allow(dead_code)] - Parallel subagents running
cargo fmtmay create unstaged changes — commit formatting separately - Secret patterns:
sk-[a-zA-Z0-9]{48}(legacy) andsk-proj-[a-zA-Z0-9\-_]{40,}(modern) — test data must match the exact format tokio::process::Commandoutput needs explicitstd::process::Outputtype annotation when using.ok()?- Tree-sitter is CPU-bound/sync — pre-fetch file content into HashMaps async, then pass
&HashMap<PathBuf, String>toextract_symbols()which uses rayon for parallel parsing rayon::par_iter()requires data to beSync;tree_sitter::Parseris neitherSendnorSync— create a newParserper file inside the rayon closure#[cfg(feature = "secure-storage")]gates both the error variant and CLI commands for keyring- Subagents dispatched without Bash permission can't commit — commit in the main session after verifying their changes
- Parallel subagents touching the same file will conflict — only parallelize when files don't overlap
SymbolKeyuses(kind, name, file)— do NOT addline(lines shift between HEAD/staged, breaks modified-symbol matching)classify_span_changeuses new-file line range — old-file lines may differ when code shifts; known limitation (deferred #9)extract_symbols()returns(Vec<CodeSymbol>, Vec<SymbolDiff>)— all callers must destructure or use.0ChangeDetailhas 25 variants (15 structural + 10 semantic markers) — keepformat_short()in sync when adding new onesinfer_commit_typetakesall_modified_docs_only: boolparameter — must be computed inbuild()before calling
- Non-atomic split commits: The split flow uses
unstage_all → stage_files → commitper group with no rollback. If an intermediate commit fails, earlier commits remain. Documented via TOCTOU comment inapp.rs. Future improvement: index snapshot with full rollback (see GitHub Discussion #2). - No streaming during split generation: When commit splitting generates per-group messages, LLM output is not streamed to the terminal (tokens are consumed silently). Single-commit generation streams normally. Low priority — split generation is fast since each sub-prompt is smaller.
- Thinking model output: Models with thinking enabled prepend
<think>...</think>blocks before their JSON response. The sanitizer strips both<think>and<thought>blocks (closed and unclosed) during parsing. Thethinkconfig option (default:false) controls whether Ollama's thinking separation is used. The default modelqwen3.5:4bdoes not use thinking mode and works well with the defaultnum_predict: 256. - No think-then-compress: Explicit
<thought>prompting is not used — small models (<10B) exhaust their token budget on analysis instead of JSON output. The pre-computed EVIDENCE/CONSTRAINTS/SYMBOLS sections serve this role. Revisit for 70B+/cloud APIs. - Retry:
validate_and_retry()runs up to 3 attempts (MAX_RETRIES: 3), logging each violation individually before retry. Future: prioritized violation ordering, per-group retry for split commits.
Real-world test results are tracked in auto-memory at test-results.md. After every manual test of commit message generation (commitbee --dry-run), record:
- The staged changes (files, type of change)
- Expected vs actual commit type
- Subject and body quality assessment
- Prompt observations (signatures, connections, evidence flags)
- Any issues (retry warnings, display bugs, misclassifications)
Compare new tests against previous results to detect regressions or improvements. The goal is generating fantastic commit messages with small local LLMs (qwen3.5:4b).
A tracked list of review findings, design decisions, and improvement ideas that were identified but deferred lives in auto-memory at deferred-issues.md. Rules:
- Check the list when starting work on a related area, before releases, and at PRD updates
- Add new items when deferring anything from a review, plan, or implementation — every deferred item must be recorded with source, context, and "when to address" criteria
- Never silently defer — when deferring issues, explicitly tell the user what is being deferred, why, and when it should be revisited. Present deferred items as decisions that need user acknowledgment, not as internal bookkeeping
- Close items by updating status to
Donewith date when addressed
Test counts and version references must stay in sync across multiple files. After adding/removing tests or bumping version:
- README.md — test count in features list + testing section + changelog current version
- DOCS.md — test count in description + testing section
- PRD.md — test count in §2.3 (feature status table), §8 (testing header), §11 (roadmap table), §12 (success metrics). Also: PRD version header, changelog entry, and compatibility policy table on version bumps.
- CHANGELOG.md — test count in current version's testing section
- CLAUDE.md — test count in Running Tests section
Use cargo test --all-targets 2>&1 | grep "^test result" | awk '{sum += $4} END {print sum}' to get the actual count. Don't guess from memory — counts drift easily.