perf: vectorize KV cache prefix matching with numpy by nausicaalii · Pull Request #2179 · abetlen/llama-cpp-python

nausicaalii · 2026-04-11T22:55:35Z

Summary

Replace O(n) Python for-loop in generate() KV cache prefix matching and longest_token_prefix() with numpy vectorized element-wise comparison
Uses np.argmin on a boolean equality array to find the first mismatch position in a single vectorized pass

Motivation

The current prefix matching iterates token-by-token in Python to find where the cached prompt diverges from the new prompt. This is fine for short prompts, but becomes a bottleneck as conversation history grows — multi-turn chat sessions can accumulate 10K–100K+ tokens in input_ids, and the linear Python loop runs on every generate() call.

Numpy's vectorized comparison runs in optimized C/SIMD, giving significant speedup for large token sequences while preserving identical behavior.

Test plan

Verified longest_token_prefix correctness across edge cases: empty sequences, full match, partial match, single element, no match, different lengths, large sequences (10K tokens)
test_real_model — passes (low-level batch decode)
test_real_llama — passes (multiple sequential create_completion calls that exercise prefix matching)
test_real_llama_embeddings — passes

Replace O(n) Python for-loop in KV cache prefix matching and longest_token_prefix() with numpy vectorized comparison. The element-wise numpy comparison runs in optimized C/SIMD instead of Python's interpreter loop, which matters as conversation history grows (10K+ tokens). No change in behavior — both paths find the first position where cached and new token sequences diverge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: vectorize KV cache prefix matching with numpy#2179

perf: vectorize KV cache prefix matching with numpy#2179
nausicaalii wants to merge 1 commit intoabetlen:mainfrom
nausicaalii:perf/vectorize-prefix-match

nausicaalii commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nausicaalii commented Apr 11, 2026

Summary

Motivation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant