Skip to content

Feature/ollama disable think#1

Open
Jian-Min-Huang wants to merge 1 commit intocustom-mainfrom
feature/ollama-disable-think
Open

Feature/ollama disable think#1
Jian-Min-Huang wants to merge 1 commit intocustom-mainfrom
feature/ollama-disable-think

Conversation

@Jian-Min-Huang
Copy link
Copy Markdown
Member

@Jian-Min-Huang Jian-Min-Huang commented Apr 5, 2026

我在試圖改用最新的 Gemma 4 來取代我 TranslateGemma 的時候發現明明每秒輸出的 Token 比較快,表現也比較好,但是反應就是很慢

gemma4:26b-a4b-it-q8_0

total duration:       1.558639625s
load duration:        137.0625ms
prompt eval count:    256 token(s)
prompt eval duration: 764.7095ms
prompt eval rate:     334.77 tokens/s
eval count:           49 token(s)
eval duration:        602.880289ms
eval rate:            81.28 tokens/s

translategemma:12b

total duration:       1.554478125s
load duration:        136.367ms
prompt eval count:    254 token(s)
prompt eval duration: 566.28025ms
prompt eval rate:     448.54 tokens/s
eval count:           44 token(s)
eval duration:        827.462332ms
eval rate:            53.17 tokens/s

經過交叉測試後發現是因為 Gemma 4 是具備思考能力的,而 TranslateGemma 沒有
所以都用預設的 think: true 就會導致雖然輸出比較快但是因為第一步就要先思考,所以請求的 total duration 會很長

這個 PR 的目的就是 extend 原本的 OllamaClient 然後把 think 預設改成 false

邏輯上用本機模型就是求快,我目前覺得在 VoiceInk 調用 OllamaClient 應該不需要開 think

Add CustomOllamaClient that mirrors OllamaClient.generate() with an
additional `think: false` parameter to prevent thinking-capable models
from returning think blocks in enhancement responses.
@Jian-Min-Huang Jian-Min-Huang force-pushed the feature/ollama-disable-think branch from 3f09344 to 362a50b Compare April 5, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants