Skip to content

Commit a84c698

Browse files
authored
Merge pull request #4185 from pipecat-ai/changelog-0.0.108
Release 0.0.108 - Changelog Update
2 parents 83dc979 + ca22421 commit a84c698

54 files changed

Lines changed: 302 additions & 53 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,308 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
<!-- towncrier release notes start -->
99

10+
## [0.0.108] - 2026-03-27
11+
12+
### Added
13+
14+
- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`,
15+
`sarvam-105b` and `sarvam-105b-32k`.
16+
(PR [#3978](https://github.com/pipecat-ai/pipecat/pull/3978))
17+
18+
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override
19+
this to perform provider-specific setup (e.g. eagerly opening a server-side
20+
context) before text starts flowing. Called each time a new turn context ID
21+
is created.
22+
(PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
23+
24+
- Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.
25+
(PR [#4031](https://github.com/pipecat-ai/pipecat/pull/4031))
26+
27+
- Added support for "developer" role messages in conversation context across
28+
all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock),
29+
"developer" messages are converted to "user" messages (use
30+
`system_instruction` to set the system instruction). For OpenAI services,
31+
"developer" messages pass through in conversation history. For the Responses
32+
API, they are kept as "developer" role (matching the existing "system" →
33+
"developer" conversion).
34+
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
35+
36+
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with
37+
Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with
38+
configurable voice, language, speed, consistency, similarity, and enhancement
39+
settings.
40+
(PR [#4092](https://github.com/pipecat-ai/pipecat/pull/4092))
41+
42+
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs
43+
from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`,
44+
which collapses the STT wait timeout to 0s and may cause delayed turn
45+
detection. The warnings guide developers to re-run the
46+
[stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD
47+
settings.
48+
(PR [#4115](https://github.com/pipecat-ai/pipecat/pull/4115))
49+
50+
- Added `domain` parameter to `AssemblyAISTTSettings` for specialized
51+
recognition modes such as Medical Mode (`domain="medical-v1"`).
52+
(PR [#4117](https://github.com/pipecat-ai/pipecat/pull/4117))
53+
54+
- Added `NovitaLLMService` for using Novita AI's LLM models via their
55+
OpenAI-compatible API.
56+
(PR [#4119](https://github.com/pipecat-ai/pipecat/pull/4119))
57+
58+
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer
59+
resources are properly released when no longer needed. Custom `VADAnalyzer`
60+
subclasses can override `cleanup()` to free any held resources.
61+
(PR [#4120](https://github.com/pipecat-ai/pipecat/pull/4120))
62+
63+
- Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires
64+
after the final transcript is pushed, providing a reliable hook for
65+
end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both
66+
Pipecat and AssemblyAI turn detection modes.
67+
(PR [#4128](https://github.com/pipecat-ai/pipecat/pull/4128))
68+
69+
- Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux
70+
speech-to-text on AWS SageMaker endpoints. Use with
71+
`ExternalUserTurnStrategies` to take advantage of Flux's turn detection.
72+
(PR [#4143](https://github.com/pipecat-ai/pipecat/pull/4143))
73+
74+
- Added `Mem0MemoryService.get_memories()` convenience method for retrieving
75+
all stored memories outside the pipeline (e.g. to build a personalized
76+
greeting at connection time). This avoids the need to manually handle client
77+
type branching, filter construction, and async wrapping.
78+
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
79+
80+
### Changed
81+
82+
- Added context prewarming path for `InworldTTSService` to improve first audio
83+
latency.
84+
(PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
85+
86+
- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp
87+
VIVA SDK (requires `krisp_audio`).
88+
(PR [#4022](https://github.com/pipecat-ai/pipecat/pull/4022))
89+
90+
- Modified `InworldTTSService` to close context at end of turn instead of
91+
relying on idle timeout.
92+
(PR [#4028](https://github.com/pipecat-ai/pipecat/pull/4028))
93+
94+
- Added Gemini 3 support to the Gemini Live service.
95+
(PR [#4078](https://github.com/pipecat-ai/pipecat/pull/4078))
96+
97+
- `TTSService`: the default `stop_frame_timeout_s` (idle time before an
98+
automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has
99+
changed from `2.0` to `3.0` seconds.
100+
(PR [#4084](https://github.com/pipecat-ai/pipecat/pull/4084))
101+
102+
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system
103+
message, matching all other adapters. Previously it searched for the first
104+
"system" message anywhere in the conversation history. A "system" message
105+
appearing later in the list will now be converted to "user" instead of being
106+
extracted as the system instruction.
107+
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
108+
109+
- Fixed `InworldTtsService` to fallback to full text when TTS timestamps are
110+
not received.
111+
(PR [#4113](https://github.com/pipecat-ai/pipecat/pull/4113))
112+
113+
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova
114+
Sonic) now prefer `system_instruction` from service settings over an initial
115+
system message in the LLM context, matching the behavior of non-realtime
116+
services. Previously, context-provided system instructions took precedence. A
117+
warning is now logged when both are set.
118+
(PR [#4130](https://github.com/pipecat-ai/pipecat/pull/4130))
119+
120+
- Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.
121+
(PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
122+
123+
- Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).
124+
(PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
125+
126+
- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a
127+
warning at startup. The log message has been downgraded to debug level since
128+
these are valid service-specific values that are passed through correctly.
129+
(PR [#4137](https://github.com/pipecat-ai/pipecat/pull/4137))
130+
131+
- `GrokLLMService` and `GrokRealtimeLLMService` now live in the
132+
`pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three
133+
use the same xAI API. Update imports from `pipecat.services.grok.*` to
134+
`pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import
135+
GrokLLMService`).
136+
(PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
137+
138+
- ⚠️ Bumped `mem0ai` dependency from `~=0.1.94` to `>=1.0.8,<2`. Users of the
139+
`mem0` extra will need to update their mem0ai package.
140+
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
141+
142+
### Deprecated
143+
144+
- `pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and
145+
`pipecat.services.grok.realtime.events` are deprecated. The old import paths
146+
still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`,
147+
`pipecat.services.xai.realtime.llm`, and
148+
`pipecat.services.xai.realtime.events` instead.
149+
(PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
150+
151+
### Removed
152+
153+
- ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and
154+
`"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that
155+
called `await self.add_word_timestamps([("Reset", 0)])` or `await
156+
self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`,
157+
replace them with `await self.append_to_audio_context(ctx_id,
158+
TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage
159+
the word-timestamp reset automatically.
160+
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
161+
162+
- Removed `SambaNovaSTTService`. SambaNova no longer offers speech-to-text
163+
audio models. Use another STT provider instead.
164+
(PR [#4154](https://github.com/pipecat-ai/pipecat/pull/4154))
165+
166+
### Fixed
167+
168+
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring
169+
`settings.system_instruction`. The system instruction was being read from a
170+
deprecated constructor parameter instead of the settings object, causing it
171+
to be silently ignored.
172+
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
173+
174+
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when
175+
the only message in context was a system message. The lone system message is
176+
now converted to "user" role instead of being extracted, matching the
177+
existing Anthropic adapter behavior.
178+
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
179+
180+
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was
181+
deferred while waiting for the bot to finish responding and `turn_complete`
182+
never arrived. As a possible root-cause fix, `turn_complete` messages are now
183+
handled even if they lack `usage_metadata`. As a fallback, the deferred
184+
`EndFrame` now has a 30-second safety timeout.
185+
(PR [#4125](https://github.com/pipecat-ai/pipecat/pull/4125))
186+
187+
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous
188+
contexts exceeded") caused by rapid user interruptions. When interruptions
189+
arrived before any TTS text was generated, phantom contexts were created on
190+
the ElevenLabs server that were never closed, eventually exceeding the
191+
5-context limit.
192+
(PR [#4126](https://github.com/pipecat-ai/pipecat/pull/4126))
193+
194+
- Fixed the final sentence being dropped from the conversation context when
195+
using RTVI text input with non-word-timestamp TTS services. The
196+
`LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`,
197+
causing the `LLMAssistantAggregator` to finalize the context before the final
198+
sentence arrived.
199+
(PR [#4127](https://github.com/pipecat-ai/pipecat/pull/4127))
200+
201+
- Fixed audio crackling and popping in recordings when both user and bot are
202+
speaking. `AudioBufferProcessor` no longer injects silence into a track's
203+
buffer while that track is actively producing audio, preventing mid-utterance
204+
interruptions in the recorded output.
205+
(PR [#4135](https://github.com/pipecat-ai/pipecat/pull/4135))
206+
207+
- Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale
208+
words or backward PTS values into later turns.
209+
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
210+
211+
- Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had
212+
been invoked but `BotStartedSpeakingFrame` had not yet been received, a user
213+
interruption could allow stale audio to leak through.
214+
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
215+
216+
- Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with
217+
external VAD) not working. The bot now correctly detects user speech and
218+
signals turn boundaries to the Gemini API.
219+
(PR [#4146](https://github.com/pipecat-ai/pipecat/pull/4146))
220+
221+
- Fixed Gemini Live message handling to process all `server_content` fields
222+
independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and
223+
`output_transcription`) on the same message, but the previous `elif` chain
224+
only processed the first match, silently dropping the rest.
225+
(PR [#4147](https://github.com/pipecat-ai/pipecat/pull/4147))
226+
227+
- Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly
228+
triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS)
229+
propagated upstream through the switcher. Previously, any non-fatal error
230+
passing through would be misattributed to the active service and trigger an
231+
unwanted service switch. Now only errors originating from the switcher's own
232+
managed services trigger failover.
233+
(PR [#4149](https://github.com/pipecat-ai/pipecat/pull/4149))
234+
235+
- Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal
236+
buffer on interruption, causing the bot to continue speaking for several
237+
seconds after being interrupted.
238+
(PR [#4151](https://github.com/pipecat-ai/pipecat/pull/4151))
239+
240+
- Fixed a crash in OpenAI LLM processing when the provider returns
241+
`chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no
242+
attribute 'get'` errors during audio transcript handling.
243+
(PR [#4152](https://github.com/pipecat-ai/pipecat/pull/4152))
244+
245+
- Fixed error floods in `DeepgramSTTService` when the WebSocket connection
246+
drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead
247+
connection instead of silently failing, causing every queued audio frame to
248+
log an error. Now `send_media()` failures are caught gracefully — a single
249+
warning is logged and audio frames are skipped until the existing
250+
reconnection logic restores the connection.
251+
(PR [#4153](https://github.com/pipecat-ai/pipecat/pull/4153))
252+
253+
- `Mem0MemoryService` no longer blocks the event loop during memory storage and
254+
retrieval. All Mem0 API calls now run in a background thread, and message
255+
storage is fire-and-forget so it doesn't delay downstream processing.
256+
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
257+
258+
- Fixed `Mem0MemoryService` failing to store messages when the context
259+
contained system or developer role messages. The Mem0 API only accepts user
260+
and assistant roles, so other roles are now filtered out before storing.
261+
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
262+
263+
- Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()`
264+
`DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup
265+
time.
266+
(PR [#4161](https://github.com/pipecat-ai/pipecat/pull/4161))
267+
268+
- Fixed an issue in `InworldTTSService` where, in cases of fast interruption,
269+
we would continue receiving audio from the previous context.
270+
(PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
271+
272+
- Fixed a word timestamp interleaving issue in `InworldTTSService` when
273+
processing multiple sentences.
274+
(PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
275+
276+
- Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using
277+
`push_stop_frames=True`. When the stop-frame timeout fired, a second
278+
`TTSStoppedFrame` could be pushed after the normal one at context completion.
279+
(PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
280+
281+
- ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK
282+
now requires explicit message objects for `send_keep_alive()`,
283+
`send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk
284+
version is now 6.1.0.
285+
(PR [#4174](https://github.com/pipecat-ai/pipecat/pull/4174))
286+
287+
- Fixed RTVI events not being delivered to clients when using WebSocket
288+
transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False`
289+
by default.
290+
(PR [#4176](https://github.com/pipecat-ai/pipecat/pull/4176))
291+
292+
- Fixed a timing issue where turn detection timer tasks (idle controller,
293+
speech timeout, turn analyzer, and turn completion) could miss their first
294+
tick because the newly created asyncio task was not yet scheduled when the
295+
caller continued.
296+
(PR [#4183](https://github.com/pipecat-ai/pipecat/pull/4183))
297+
298+
- Fixed `FastAPIWebsocketTransport` intermittently hanging on shutdown when the
299+
remote side (e.g. Twilio) disconnects while audio is being sent. A race
300+
condition between the send and receive paths could cause the
301+
`on_client_disconnected` callback to be skipped, leaving the pipeline waiting
302+
for a disconnect signal that never came.
303+
(PR [#4186](https://github.com/pipecat-ai/pipecat/pull/4186))
304+
305+
### Performance
306+
307+
- `RimeTTSService` now handles Rime's `done` WebSocket message to complete
308+
audio contexts immediately, eliminating the 3-second idle timeout that
309+
previously added latency at the end of each utterance.
310+
(PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
311+
10312
## [0.0.107] - 2026-03-23
11313

12314
### Added

changelog/3978.added.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4013.added.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4013.changed.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4022.changed.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4028.changed.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4031.added.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4078.changed.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4084.changed.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

changelog/4089.added.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)