Feature/ollama disable think#1
Open
Jian-Min-Huang wants to merge 1 commit intocustom-mainfrom
Open
Conversation
980a378 to
3f09344
Compare
1200e26 to
9989521
Compare
Add CustomOllamaClient that mirrors OllamaClient.generate() with an additional `think: false` parameter to prevent thinking-capable models from returning think blocks in enhancement responses.
3f09344 to
362a50b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
我在試圖改用最新的 Gemma 4 來取代我 TranslateGemma 的時候發現明明每秒輸出的 Token 比較快,表現也比較好,但是反應就是很慢
gemma4:26b-a4b-it-q8_0
translategemma:12b
經過交叉測試後發現是因為 Gemma 4 是具備思考能力的,而 TranslateGemma 沒有
所以都用預設的
think: true就會導致雖然輸出比較快但是因為第一步就要先思考,所以請求的 total duration 會很長這個 PR 的目的就是 extend 原本的 OllamaClient 然後把 think 預設改成 false
邏輯上用本機模型就是求快,我目前覺得在 VoiceInk 調用 OllamaClient 應該不需要開 think