You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Grid ruler overlay — screenshot() now accepts grid=True to draw a coordinate ruler in the margins of a region crop (major ticks every 50px on X / 20px on Y, alternating magenta/cyan minor gridlines), letting smaller vision models read click coordinates straight off the labels instead of estimating pixel offsets
Small-model prompt — New dedicated prompt with an explicit click-coordinate recipe and workflow built around the grid ruler, targeted at compact vision models that struggle with raw pixel counting
Adaptive detection padding — screen module now adjusts detection padding dynamically with clearer module boundaries between capture, rulers, and shared encoding
Refactoring
WebP by default — screenshot() now returns WebP instead of PNG by default, significantly cutting the token cost of every capture for agents
GPA-GUI-Detector dropped — Removed the external GUI detector dependency in favor of a lighter, more predictable ruler-based approach
Cursor size — Adjusted to 24px for better visibility in captures, removed LLM-specific cursor comments
Wheel build cleanup — Removed unused force-include config from the wheel build
Performance
Faster feedback poll — Visual feedback loop now compares raw PNG bytes directly instead of computing MD5 hashes, reducing reaction-time latency on every mouse/keyboard action
Fixes
press_key is case-tolerant — Multi-character keysyms are now normalized: press_key("Return"), press_key("return") and press_key("RETURN") all work equivalently
Documentation
README — Now recommends the llama.cpp fork over LM Studio for local inference; clarifies that small/medium models require both vision and tool use
Screenshot region= — Clarified that region= is a true native crop, not a zoom or interpolation
Small-model guide — New prompt with explicit click-coordinate recipe, plus a menu grid precision screenshot illustrating the workflow on small/medium models
SYSTEM_PROMPT.md — Renamed and restructured, critical rules emphasized
Testing
Coverage additions — New test suites for capture._reencode, server.main, middleware (error handling and coercion), _logging configuration, and the screen._shared module