Release v6.0.0 · YV17labs/GhostDesk

New Features

Grid ruler overlay — screenshot() now accepts grid=True to draw a coordinate ruler in the margins of a region crop (major ticks every 50px on X / 20px on Y, alternating magenta/cyan minor gridlines), letting smaller vision models read click coordinates straight off the labels instead of estimating pixel offsets
Small-model prompt — New dedicated prompt with an explicit click-coordinate recipe and workflow built around the grid ruler, targeted at compact vision models that struggle with raw pixel counting
Adaptive detection padding — screen module now adjusts detection padding dynamically with clearer module boundaries between capture, rulers, and shared encoding

WebP by default — screenshot() now returns WebP instead of PNG by default, significantly cutting the token cost of every capture for agents
GPA-GUI-Detector dropped — Removed the external GUI detector dependency in favor of a lighter, more predictable ruler-based approach
Cursor size — Adjusted to 24px for better visibility in captures, removed LLM-specific cursor comments
Wheel build cleanup — Removed unused force-include config from the wheel build

Faster feedback poll — Visual feedback loop now compares raw PNG bytes directly instead of computing MD5 hashes, reducing reaction-time latency on every mouse/keyboard action

press_key is case-tolerant — Multi-character keysyms are now normalized: press_key("Return"), press_key("return") and press_key("RETURN") all work equivalently

README — Now recommends the llama.cpp fork over LM Studio for local inference; clarifies that small/medium models require both vision and tool use
Screenshot region= — Clarified that region= is a true native crop, not a zoom or interpolation
Small-model guide — New prompt with explicit click-coordinate recipe, plus a menu grid precision screenshot illustrating the workflow on small/medium models
SYSTEM_PROMPT.md — Renamed and restructured, critical rules emphasized

Coverage additions — New test suites for capture._reencode, server.main, middleware (error handling and coercion), _logging configuration, and the screen._shared module