Skip to content

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386

Open
zsogitbe wants to merge 1 commit into
SciSharp:masterfrom
zsogitbe:SequenceIDPooling
Open

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386
zsogitbe wants to merge 1 commit into
SciSharp:masterfrom
zsogitbe:SequenceIDPooling

Conversation

@zsogitbe
Copy link
Copy Markdown
Contributor

Description

The Problem
Under sustained traffic or continuous batching scenarios, the BatchedExecutor eventually causes the native llama.cpp backend to deadlock, reject tokens, or throw a segmentation fault.

The Root Cause
Currently, BatchedExecutor assigns sequence IDs using a strictly incrementing counter (_nextSequenceId++). When a Conversation is disposed, its tokens are properly removed from the KV cache via MemorySequenceRemove, but its Sequence ID is permanently abandoned.

In llama.cpp, the SeqMax (n_seq_max) parameter dictates the static allocation of arrays within the native KV cache. It expects sequence IDs to act as strictly bounded array indices (from 0 to SeqMax - 1). Once the C# _nextSequenceId counter exceeds SeqMax, passing that ID to the native backend results in an out-of-bounds memory access.

The Solution
This PR replaces the strictly incrementing counter with a Sequence ID pool.

  1. BatchedExecutor: Updated to track active IDs (e.g., using a thread-safe HashSet or similar structure) and assign the lowest available sequence ID. Added a ReleaseSequenceId method.
  2. Conversation: Updated the Dispose() method to call Executor.ReleaseSequenceId(ConversationId) after clearing the KV cache.

Benefits

  • Native Memory Safety: Guarantees that sequence IDs will never exceed the configured SeqMax limit as long as concurrent conversations stay within bounds.
  • Indefinite Uptime: Allows the BatchedExecutor to run continuously under high-traffic workloads without requiring the host application to dangerously destroy and recreate the massive native context to reset the ID counter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant