Skip to content

v6.0.0

Choose a tag to compare

@maltyxx maltyxx released this 10 Apr 17:51
· 90 commits to main since this release

New Features

  • Grid ruler overlayscreenshot() now accepts grid=True to draw a coordinate ruler in the margins of a region crop (major ticks every 50px on X / 20px on Y, alternating magenta/cyan minor gridlines), letting smaller vision models read click coordinates straight off the labels instead of estimating pixel offsets
  • Small-model prompt — New dedicated prompt with an explicit click-coordinate recipe and workflow built around the grid ruler, targeted at compact vision models that struggle with raw pixel counting
  • Adaptive detection paddingscreen module now adjusts detection padding dynamically with clearer module boundaries between capture, rulers, and shared encoding

Refactoring

  • WebP by defaultscreenshot() now returns WebP instead of PNG by default, significantly cutting the token cost of every capture for agents
  • GPA-GUI-Detector dropped — Removed the external GUI detector dependency in favor of a lighter, more predictable ruler-based approach
  • Cursor size — Adjusted to 24px for better visibility in captures, removed LLM-specific cursor comments
  • Wheel build cleanup — Removed unused force-include config from the wheel build

Performance

  • Faster feedback poll — Visual feedback loop now compares raw PNG bytes directly instead of computing MD5 hashes, reducing reaction-time latency on every mouse/keyboard action

Fixes

  • press_key is case-tolerant — Multi-character keysyms are now normalized: press_key("Return"), press_key("return") and press_key("RETURN") all work equivalently

Documentation

  • README — Now recommends the llama.cpp fork over LM Studio for local inference; clarifies that small/medium models require both vision and tool use
  • Screenshot region= — Clarified that region= is a true native crop, not a zoom or interpolation
  • Small-model guide — New prompt with explicit click-coordinate recipe, plus a menu grid precision screenshot illustrating the workflow on small/medium models
  • SYSTEM_PROMPT.md — Renamed and restructured, critical rules emphasized

Testing

  • Coverage additions — New test suites for capture._reencode, server.main, middleware (error handling and coercion), _logging configuration, and the screen._shared module