@@ -7,6 +7,308 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
88<!-- towncrier release notes start -->
99
10+ ## [0.0.108] - 2026-03-27
11+
12+ ### Added
13+
14+ - Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`,
15+ `sarvam-105b` and `sarvam-105b-32k`.
16+ (PR [#3978](https://github.com/pipecat-ai/pipecat/pull/3978))
17+
18+ - Added `on_turn_context_created(context_id)` hook to `TTSService`. Override
19+ this to perform provider-specific setup (e.g. eagerly opening a server-side
20+ context) before text starts flowing. Called each time a new turn context ID
21+ is created.
22+ (PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
23+
24+ - Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.
25+ (PR [#4031](https://github.com/pipecat-ai/pipecat/pull/4031))
26+
27+ - Added support for "developer" role messages in conversation context across
28+ all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock),
29+ "developer" messages are converted to "user" messages (use
30+ `system_instruction` to set the system instruction). For OpenAI services,
31+ "developer" messages pass through in conversation history. For the Responses
32+ API, they are kept as "developer" role (matching the existing "system" →
33+ "developer" conversion).
34+ (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
35+
36+ - Added `SmallestTTSService`, a WebSocket-based TTS service integration with
37+ Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with
38+ configurable voice, language, speed, consistency, similarity, and enhancement
39+ settings.
40+ (PR [#4092](https://github.com/pipecat-ai/pipecat/pull/4092))
41+
42+ - Added warnings in turn stop strategies when `VADParams.stop_secs` differs
43+ from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`,
44+ which collapses the STT wait timeout to 0s and may cause delayed turn
45+ detection. The warnings guide developers to re-run the
46+ [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD
47+ settings.
48+ (PR [#4115](https://github.com/pipecat-ai/pipecat/pull/4115))
49+
50+ - Added `domain` parameter to `AssemblyAISTTSettings` for specialized
51+ recognition modes such as Medical Mode (`domain="medical-v1"`).
52+ (PR [#4117](https://github.com/pipecat-ai/pipecat/pull/4117))
53+
54+ - Added `NovitaLLMService` for using Novita AI's LLM models via their
55+ OpenAI-compatible API.
56+ (PR [#4119](https://github.com/pipecat-ai/pipecat/pull/4119))
57+
58+ - Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer
59+ resources are properly released when no longer needed. Custom `VADAnalyzer`
60+ subclasses can override `cleanup()` to free any held resources.
61+ (PR [#4120](https://github.com/pipecat-ai/pipecat/pull/4120))
62+
63+ - Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires
64+ after the final transcript is pushed, providing a reliable hook for
65+ end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both
66+ Pipecat and AssemblyAI turn detection modes.
67+ (PR [#4128](https://github.com/pipecat-ai/pipecat/pull/4128))
68+
69+ - Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux
70+ speech-to-text on AWS SageMaker endpoints. Use with
71+ `ExternalUserTurnStrategies` to take advantage of Flux's turn detection.
72+ (PR [#4143](https://github.com/pipecat-ai/pipecat/pull/4143))
73+
74+ - Added `Mem0MemoryService.get_memories()` convenience method for retrieving
75+ all stored memories outside the pipeline (e.g. to build a personalized
76+ greeting at connection time). This avoids the need to manually handle client
77+ type branching, filter construction, and async wrapping.
78+ (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
79+
80+ ### Changed
81+
82+ - Added context prewarming path for `InworldTTSService` to improve first audio
83+ latency.
84+ (PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
85+
86+ - Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp
87+ VIVA SDK (requires `krisp_audio`).
88+ (PR [#4022](https://github.com/pipecat-ai/pipecat/pull/4022))
89+
90+ - Modified `InworldTTSService` to close context at end of turn instead of
91+ relying on idle timeout.
92+ (PR [#4028](https://github.com/pipecat-ai/pipecat/pull/4028))
93+
94+ - Added Gemini 3 support to the Gemini Live service.
95+ (PR [#4078](https://github.com/pipecat-ai/pipecat/pull/4078))
96+
97+ - `TTSService`: the default `stop_frame_timeout_s` (idle time before an
98+ automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has
99+ changed from `2.0` to `3.0` seconds.
100+ (PR [#4084](https://github.com/pipecat-ai/pipecat/pull/4084))
101+
102+ - ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system
103+ message, matching all other adapters. Previously it searched for the first
104+ "system" message anywhere in the conversation history. A "system" message
105+ appearing later in the list will now be converted to "user" instead of being
106+ extracted as the system instruction.
107+ (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
108+
109+ - Fixed `InworldTtsService` to fallback to full text when TTS timestamps are
110+ not received.
111+ (PR [#4113](https://github.com/pipecat-ai/pipecat/pull/4113))
112+
113+ - ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova
114+ Sonic) now prefer `system_instruction` from service settings over an initial
115+ system message in the LLM context, matching the behavior of non-realtime
116+ services. Previously, context-provided system instructions took precedence. A
117+ warning is now logged when both are set.
118+ (PR [#4130](https://github.com/pipecat-ai/pipecat/pull/4130))
119+
120+ - Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.
121+ (PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
122+
123+ - Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).
124+ (PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
125+
126+ - Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a
127+ warning at startup. The log message has been downgraded to debug level since
128+ these are valid service-specific values that are passed through correctly.
129+ (PR [#4137](https://github.com/pipecat-ai/pipecat/pull/4137))
130+
131+ - `GrokLLMService` and `GrokRealtimeLLMService` now live in the
132+ `pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three
133+ use the same xAI API. Update imports from `pipecat.services.grok.*` to
134+ `pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import
135+ GrokLLMService`).
136+ (PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
137+
138+ - ⚠️ Bumped `mem0ai` dependency from `~=0.1.94` to `>=1.0.8,<2`. Users of the
139+ `mem0` extra will need to update their mem0ai package.
140+ (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
141+
142+ ### Deprecated
143+
144+ - `pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and
145+ `pipecat.services.grok.realtime.events` are deprecated. The old import paths
146+ still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`,
147+ `pipecat.services.xai.realtime.llm`, and
148+ `pipecat.services.xai.realtime.events` instead.
149+ (PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
150+
151+ ### Removed
152+
153+ - ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and
154+ `"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that
155+ called `await self.add_word_timestamps([("Reset", 0)])` or `await
156+ self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`,
157+ replace them with `await self.append_to_audio_context(ctx_id,
158+ TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage
159+ the word-timestamp reset automatically.
160+ (PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
161+
162+ - Removed `SambaNovaSTTService`. SambaNova no longer offers speech-to-text
163+ audio models. Use another STT provider instead.
164+ (PR [#4154](https://github.com/pipecat-ai/pipecat/pull/4154))
165+
166+ ### Fixed
167+
168+ - Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring
169+ `settings.system_instruction`. The system instruction was being read from a
170+ deprecated constructor parameter instead of the settings object, causing it
171+ to be silently ignored.
172+ (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
173+
174+ - Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when
175+ the only message in context was a system message. The lone system message is
176+ now converted to "user" role instead of being extracted, matching the
177+ existing Anthropic adapter behavior.
178+ (PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
179+
180+ - Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was
181+ deferred while waiting for the bot to finish responding and `turn_complete`
182+ never arrived. As a possible root-cause fix, `turn_complete` messages are now
183+ handled even if they lack `usage_metadata`. As a fallback, the deferred
184+ `EndFrame` now has a 30-second safety timeout.
185+ (PR [#4125](https://github.com/pipecat-ai/pipecat/pull/4125))
186+
187+ - Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous
188+ contexts exceeded") caused by rapid user interruptions. When interruptions
189+ arrived before any TTS text was generated, phantom contexts were created on
190+ the ElevenLabs server that were never closed, eventually exceeding the
191+ 5-context limit.
192+ (PR [#4126](https://github.com/pipecat-ai/pipecat/pull/4126))
193+
194+ - Fixed the final sentence being dropped from the conversation context when
195+ using RTVI text input with non-word-timestamp TTS services. The
196+ `LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`,
197+ causing the `LLMAssistantAggregator` to finalize the context before the final
198+ sentence arrived.
199+ (PR [#4127](https://github.com/pipecat-ai/pipecat/pull/4127))
200+
201+ - Fixed audio crackling and popping in recordings when both user and bot are
202+ speaking. `AudioBufferProcessor` no longer injects silence into a track's
203+ buffer while that track is actively producing audio, preventing mid-utterance
204+ interruptions in the recorded output.
205+ (PR [#4135](https://github.com/pipecat-ai/pipecat/pull/4135))
206+
207+ - Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale
208+ words or backward PTS values into later turns.
209+ (PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
210+
211+ - Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had
212+ been invoked but `BotStartedSpeakingFrame` had not yet been received, a user
213+ interruption could allow stale audio to leak through.
214+ (PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
215+
216+ - Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with
217+ external VAD) not working. The bot now correctly detects user speech and
218+ signals turn boundaries to the Gemini API.
219+ (PR [#4146](https://github.com/pipecat-ai/pipecat/pull/4146))
220+
221+ - Fixed Gemini Live message handling to process all `server_content` fields
222+ independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and
223+ `output_transcription`) on the same message, but the previous `elif` chain
224+ only processed the first match, silently dropping the rest.
225+ (PR [#4147](https://github.com/pipecat-ai/pipecat/pull/4147))
226+
227+ - Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly
228+ triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS)
229+ propagated upstream through the switcher. Previously, any non-fatal error
230+ passing through would be misattributed to the active service and trigger an
231+ unwanted service switch. Now only errors originating from the switcher's own
232+ managed services trigger failover.
233+ (PR [#4149](https://github.com/pipecat-ai/pipecat/pull/4149))
234+
235+ - Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal
236+ buffer on interruption, causing the bot to continue speaking for several
237+ seconds after being interrupted.
238+ (PR [#4151](https://github.com/pipecat-ai/pipecat/pull/4151))
239+
240+ - Fixed a crash in OpenAI LLM processing when the provider returns
241+ `chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no
242+ attribute 'get'` errors during audio transcript handling.
243+ (PR [#4152](https://github.com/pipecat-ai/pipecat/pull/4152))
244+
245+ - Fixed error floods in `DeepgramSTTService` when the WebSocket connection
246+ drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead
247+ connection instead of silently failing, causing every queued audio frame to
248+ log an error. Now `send_media()` failures are caught gracefully — a single
249+ warning is logged and audio frames are skipped until the existing
250+ reconnection logic restores the connection.
251+ (PR [#4153](https://github.com/pipecat-ai/pipecat/pull/4153))
252+
253+ - `Mem0MemoryService` no longer blocks the event loop during memory storage and
254+ retrieval. All Mem0 API calls now run in a background thread, and message
255+ storage is fire-and-forget so it doesn't delay downstream processing.
256+ (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
257+
258+ - Fixed `Mem0MemoryService` failing to store messages when the context
259+ contained system or developer role messages. The Mem0 API only accepts user
260+ and assistant roles, so other roles are now filtered out before storing.
261+ (PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
262+
263+ - Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()`
264+ `DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup
265+ time.
266+ (PR [#4161](https://github.com/pipecat-ai/pipecat/pull/4161))
267+
268+ - Fixed an issue in `InworldTTSService` where, in cases of fast interruption,
269+ we would continue receiving audio from the previous context.
270+ (PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
271+
272+ - Fixed a word timestamp interleaving issue in `InworldTTSService` when
273+ processing multiple sentences.
274+ (PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
275+
276+ - Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using
277+ `push_stop_frames=True`. When the stop-frame timeout fired, a second
278+ `TTSStoppedFrame` could be pushed after the normal one at context completion.
279+ (PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
280+
281+ - ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK
282+ now requires explicit message objects for `send_keep_alive()`,
283+ `send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk
284+ version is now 6.1.0.
285+ (PR [#4174](https://github.com/pipecat-ai/pipecat/pull/4174))
286+
287+ - Fixed RTVI events not being delivered to clients when using WebSocket
288+ transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False`
289+ by default.
290+ (PR [#4176](https://github.com/pipecat-ai/pipecat/pull/4176))
291+
292+ - Fixed a timing issue where turn detection timer tasks (idle controller,
293+ speech timeout, turn analyzer, and turn completion) could miss their first
294+ tick because the newly created asyncio task was not yet scheduled when the
295+ caller continued.
296+ (PR [#4183](https://github.com/pipecat-ai/pipecat/pull/4183))
297+
298+ - Fixed `FastAPIWebsocketTransport` intermittently hanging on shutdown when the
299+ remote side (e.g. Twilio) disconnects while audio is being sent. A race
300+ condition between the send and receive paths could cause the
301+ `on_client_disconnected` callback to be skipped, leaving the pipeline waiting
302+ for a disconnect signal that never came.
303+ (PR [#4186](https://github.com/pipecat-ai/pipecat/pull/4186))
304+
305+ ### Performance
306+
307+ - `RimeTTSService` now handles Rime's `done` WebSocket message to complete
308+ audio contexts immediately, eliminating the 3-second idle timeout that
309+ previously added latency at the end of each utterance.
310+ (PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
311+
10312## [0.0.107] - 2026-03-23
11313
12314### Added
0 commit comments