Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes by zsogitbe · Pull Request #1386 · SciSharp/LLamaSharp

zsogitbe · 2026-05-11T04:57:45Z

Description

The Problem
Under sustained traffic or continuous batching scenarios, the BatchedExecutor eventually causes the native llama.cpp backend to deadlock, reject tokens, or throw a segmentation fault.

The Root Cause
Currently, BatchedExecutor assigns sequence IDs using a strictly incrementing counter (_nextSequenceId++). When a Conversation is disposed, its tokens are properly removed from the KV cache via MemorySequenceRemove, but its Sequence ID is permanently abandoned.

In llama.cpp, the SeqMax (n_seq_max) parameter dictates the static allocation of arrays within the native KV cache. It expects sequence IDs to act as strictly bounded array indices (from 0 to SeqMax - 1). Once the C# _nextSequenceId counter exceeds SeqMax, passing that ID to the native backend results in an out-of-bounds memory access.

The Solution
This PR replaces the strictly incrementing counter with a Sequence ID pool.

BatchedExecutor: Updated to track active IDs (e.g., using a thread-safe HashSet or similar structure) and assign the lowest available sequence ID. Added a ReleaseSequenceId method.
Conversation: Updated the Dispose() method to call Executor.ReleaseSequenceId(ConversationId) after clearing the KV cache.

Benefits

Native Memory Safety: Guarantees that sequence IDs will never exceed the configured SeqMax limit as long as concurrent conversations stay within bounds.
Indefinite Uptime: Allows the BatchedExecutor to run continuously under high-traffic workloads without requiring the host application to dangerously destroy and recreate the massive native context to reset the ID counter.

…qMax overflow and crashes

Implement Sequence ID pooling in BatchedExecutor to prevent native Se…

702adf1

…qMax overflow and crashes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386
zsogitbe wants to merge 1 commit into
SciSharp:masterfrom
zsogitbe:SequenceIDPooling

zsogitbe commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant