Releases · YV17labs/GhostDesk

22 Apr 00:06

maltyxx

v7.2.0

71d2c3d

v7.2.0 Latest

Latest

Highlights

Reliable screen_changed feedback. Input tools no longer return false negatives. Polling now compares the full screen at quarter resolution via a bounding-box ratio, so any real UI change is caught regardless of where it lands — particularly for keyboard actions, where focus is unrelated to the mouse cursor and the previous zone-based check was systematically wrong.
New mouse_move tool. Lets agents trigger hover-only UI reactions (CSS :hover states, dropdowns that appear on mouse-over, tooltips) without clicking.

Changes

Added

mouse_move — moves the cursor without pressing any button, for hover-triggered UI states.
"Filling tabular UIs" section in the server instructions — explains that spreadsheets and grid forms can be driven in a single key_type (\t for next cell, \n for next row) or a clipboard_set + paste, removing the need to click each cell.
screens_differ() and capture_png(scale=…) — public helpers in screen/_shared.py for downsampled, threshold-based comparison.

Fixed

screen_changed reliability — input tools now poll the full screen instead of a 200×200 px zone around the mouse cursor. key_type and key_press no longer report false negatives, and mouse tools detect effects that land away from the click point (toasts, dropdowns, distant menus). The polling baseline is decoded once per call to keep the hot loop cheap.
Latent bug in screens_stable — Pillow.ImageChops.difference() on RGBA captures left the diff's alpha channel at 0, which made getbbox() ignore real RGB changes. Captures are now converted to RGB before comparison.
Test suite — tool-count assertion bumped to 13 to reflect mouse_move.

Changed

Middleware now logs tool call durations in milliseconds instead of seconds.

Removed

_cursor module — get_cursor_position() was orphaned by the feedback refactor; the module and its callers in _wayland.py have been dropped.

Documentation

README points the llama.cpp fork to the integration/webp-turbo branch.
Demo video moved to GitHub's user-attachments CDN; the demos/ directory has been removed from the repo.
.DS_Store files are now ignored and untracked.

Assets 2

19 Apr 20:42

maltyxx

v7.1.0

f970f5b

v7.1.0

Native MCP surfaces the server wasn't exposing yet (resources, lifespan warm-up, icons, tool annotations), stricter HTTP-transport security, finer-grained tool feedback through MCP notifications/message, and a consolidated system-level brief delivered through the spec-canonical instructions field.

Added

MCP resources. ghostdesk://apps (JSON catalogue of installed GUI apps) and ghostdesk://clipboard (current clipboard text) mirror the app_list / clipboard_get tools so clients that surface resources in a dedicated picker can reach read-only state without spending an agent turn on a tool call.
FastMCP lifespan. The server pre-binds zwlr_virtual_pointer_v1 and zwp_virtual_keyboard_v1 during ASGI startup. Missing compositor protocols now fail at boot instead of surfacing mid-request on the first mouse_click.
MCP context notifications on tools. mouse_* and key_* push a warning when the 200×200 zone around the action does not change within 2 s — the miss is visible in the client's transcript, not only in the tool result dict. app_launch and clipboard_set mirror their outcomes through ctx.info / ctx.error.
GhostDesk icon on every MCP surface. The branded mark is advertised on the server itself, every tool, and both resources through MCP's icons field. Inlined as a base64 SVG data URI — no packaging asset to ship alongside the wheel.
ToolAnnotations on every tool. readOnlyHint, destructiveHint, and idempotentHint let MCP clients differentiate approval flows for read-only vs destructive actions: screen_shot / clipboard_get / app_list are tagged read-only + idempotent, mouse_click / mouse_drag / key_press are tagged destructive, etc.
Origin header validation (MCP Streamable HTTP spec § DNS-rebinding). Browser requests must match GHOSTDESK_ALLOWED_ORIGINS (comma-separated) or get a 403. Non-browser clients (no Origin header) pass through unchanged.
Loopback bind by default. GHOSTDESK_HOST defaults to 127.0.0.1; the container entrypoint exports 0.0.0.0 so Docker port-publishing still reaches the server, but standalone uv run ghostdesk no longer silently exposes the port to the LAN.

Changed

Consolidated system-level brief. The full agent doctrine (SEE → ACT → SEE, prefer-keyboard, interruption handling, scroll-to-end, final self-check) is now carried by the server instructions field — the MCP spec-canonical payload delivered in the initialize response and auto-injected by every compliant client. Per the MCP spec, prompts are user-controlled templates (slash commands, picker entries), which makes them the wrong mechanism for a system-level brief that must always reach the model. One document, guaranteed delivery.
Package layout for MCP surfaces. resources is now a package (matching apps, clipboard, input, screen) — every domain with a register(mcp) function follows the same __init__.py convention.
warn_on_miss helper. Lives in input/feedback.py alongside build_feedback and poll_for_change, so mouse and keyboard tools share the miss-warning path without crossing underscore-prefixed module boundaries.
mcp[cli] pinned to >=1.27. Unlocks the ToolAnnotations, Icon, and lifespan APIs used throughout this release.

Fixed

Wheel scroll direction inverted. mouse_scroll(direction="up") (and "left") silently scrolled the other way: the virtual-pointer axis_discrete request was sent with discrete=+1 regardless of value's sign, violating the wl_pointer protocol invariant that the two must match within a frame. Firefox — like any wheel-aware client — trusts delta_discrete, so every "up" scroll collapsed into "down" and pinned at the page bottom. Sign is now carried in _SCROLL_VECTORS alongside value, and a static test locks the invariant.

Removed

Standalone SYSTEM_PROMPT.md. Its content is now folded into the server instructions field, delivered automatically at session init. Users who referenced the markdown file directly no longer need to — the guidance now reaches the model through the MCP handshake.

Full Changelog: v7.0.1...v7.1.0

Assets 2

15 Apr 22:39

maltyxx

v7.0.1

d34c713

v7.0.1

Fixed

Missing envsubst in runtime images. entrypoint.sh uses envsubst to inject GHOSTDESK_SCREEN_WIDTH / GHOSTDESK_SCREEN_HEIGHT into the Sway config, but the binary was not part of the runtime stack — containers booted into a crash loop (envsubst: command not found). Added gettext-base to both docker/base/Dockerfile and .devcontainer/Dockerfile.

Assets 2

15 Apr 19:07

maltyxx

v7.0.0

5605f43

v7.0.0

Major platform overhaul: migration from X11 / Openbox to a native Wayland / Sway stack, end-to-end TLS, per-request coordinate model space for mixed frontier + local model fleets, and a simplified agent-first documentation story.

Highlights

Native Wayland / Sway stack. The devcontainer and runtime images now boot a Wayland session managed by supervisord. wl-copy / wl-paste replace the X11 clipboard path and grim replaces the X11 capture tool. The input stack drops dotool in favour of direct Wayland virtual-pointer / virtual-keyboard protocols.
GhostDesk-Model-Space HTTP header. The coordinate-normalisation middleware now rescales LLM coordinates to screen pixels per request, driven by the header (e.g. 1000 for the Qwen family). No header → pass-through for frontier models (Claude, GPT-4o, Gemini). One MCP server can now serve mixed fleets without a restart, and small local models reach frontier-level click precision with no grid overlay.
Grid mode retired. The ruler overlay, rulers.py and the "precision recipe" in the small-model prompt are removed — the new coordinate path makes them unnecessary.
wayvnc from pinned source. wayvnc / neatvnc / aml are built from a pinned master commit inside a dedicated vnc-builder Docker stage so classic VNC Auth (RFB security type 2) can be advertised — required for noVNC 1.6 interop.
End-to-end TLS. websockify and the MCP server auto-detect a mounted certificate at /etc/ghostdesk/tls/server.{crt,key} (or via GHOSTDESK_TLS_CERT / GHOSTDESK_TLS_KEY) and switch to wss:// / https:// at boot. README gains an mkcert quickstart.
VNC hardening. GHOSTDESK_VNC_ADDRESS is hard-pinned to 127.0.0.1; override attempts are logged and ignored. Password + token + TLS wired together end to end.
Environment variables namespaced under GHOSTDESK_* (GHOSTDESK_PORT, GHOSTDESK_SCREEN_WIDTH, …). Standard POSIX vars (TZ, LANG) unchanged.
arm64 base image builds cleanly from a clean checkout — Raspberry Pi / Apple Silicon / ARM servers are first-class.
Docker layout restructured into per-service subdirectories (docker/base, docker/init, docker/services/...).
Tool surface renamed to a consistent verb_noun convention; README restructured around an agents-first pitch.
License change: AGPL-3.0 with Commons Clause → FSL-1.1-ALv2. Cleaner language, explicit permitted purposes, explicit Competing Use prohibition, and each released version auto-transitions to Apache 2.0 on its second anniversary.

Fixed

_desktop._parse_exec now strips a leading env wrapper when resolving .desktop entries.

Full changelog

See CHANGELOG.md · v6.0.0…v7.0.0

Assets 2

10 Apr 17:51

maltyxx

v6.0.0

f4db7cc

v6.0.0

New Features

Grid ruler overlay — screenshot() now accepts grid=True to draw a coordinate ruler in the margins of a region crop (major ticks every 50px on X / 20px on Y, alternating magenta/cyan minor gridlines), letting smaller vision models read click coordinates straight off the labels instead of estimating pixel offsets
Small-model prompt — New dedicated prompt with an explicit click-coordinate recipe and workflow built around the grid ruler, targeted at compact vision models that struggle with raw pixel counting
Adaptive detection padding — screen module now adjusts detection padding dynamically with clearer module boundaries between capture, rulers, and shared encoding

Refactoring

WebP by default — screenshot() now returns WebP instead of PNG by default, significantly cutting the token cost of every capture for agents
GPA-GUI-Detector dropped — Removed the external GUI detector dependency in favor of a lighter, more predictable ruler-based approach
Cursor size — Adjusted to 24px for better visibility in captures, removed LLM-specific cursor comments
Wheel build cleanup — Removed unused force-include config from the wheel build

Performance

Faster feedback poll — Visual feedback loop now compares raw PNG bytes directly instead of computing MD5 hashes, reducing reaction-time latency on every mouse/keyboard action

Fixes

press_key is case-tolerant — Multi-character keysyms are now normalized: press_key("Return"), press_key("return") and press_key("RETURN") all work equivalently

Documentation

README — Now recommends the llama.cpp fork over LM Studio for local inference; clarifies that small/medium models require both vision and tool use
Screenshot region= — Clarified that region= is a true native crop, not a zoom or interpolation
Small-model guide — New prompt with explicit click-coordinate recipe, plus a menu grid precision screenshot illustrating the workflow on small/medium models
SYSTEM_PROMPT.md — Renamed and restructured, critical rules emphasized

Testing

Coverage additions — New test suites for capture._reencode, server.main, middleware (error handling and coercion), _logging configuration, and the screen._shared module

Assets 2

08 Apr 23:58

maltyxx

v5.0.0

c605fef

v5.0.0

New Features

Visual feedback system — Mouse and keyboard actions now return screen_changed and reaction_time_ms, giving agents immediate confirmation of their interactions
Ruler-based coordinate system — New screen/rulers.py produces zoomed screenshots with coordinate rulers (major ticks every 50px, minor ticks every 25px) for precise, reliable targeting
process_status tool — New shell tool to inspect the state and logs of processes launched via launch()
Precision-focused agent protocol — New SYSTEM_PROMPT.md documents the two-step ruler-based coordinate protocol for agents

Refactoring

Modular input feedback — Extracted visual feedback into a dedicated input/feedback.py module
Cursor module — Extracted cursor handling into its own _cursor.py
Shared image encoding — Consolidated WebP/PNG encoding into a single save_image_bytes() utility in screen/_shared.py, reused by capture.py and rulers.py
Legacy cleanup — Removed obsolete screen/grounding.py, screen/overlay.py, screen/reader.py, shell/wait.py, and the inspect() tool

Container & Environment

SYS_ADMIN capability — Added to the container for proper privilege handling
GNOME keyring — Now unlocked at container start for secure credential storage
Locale persistence — Fixed locale issues across container restarts

Documentation

README — Documented the ruler-based coordinate protocol and the new process_status tool
System prompt — Anonymized for generic desktop control, removing user-specific references
Obsolete docs removed — Cleaned up legacy inspect() documentation and demo screenshots

Testing

Massive coverage additions — New test suites for feedback, _shared, process_status, _logging, middleware, and cursor modules
Test hygiene — Moved assertions inside patch() contexts and added filterwarnings for pydantic RuntimeWarnings

Commits

feat: add SYS_ADMIN capability, unlock gnome-keyring, fix locale persistence
refactor: extract cursor, feedback, and process_status modules; remove wait tool
feat: add visual feedback to mouse and keyboard actions; update LLM instructions
refactor: extract save_image_bytes utility and consolidate image encoding
chore: anonymize system prompt for generic desktop control
fix: move test assertions inside patch context and add filterwarnings
feat: update documentation and build files for visual feedback v5
chore: remove obsolete screenshot.webp demo image
chore: remove obsolete inspect() documentation
docs: add missing process_status tool to README
test: add comprehensive tests for ghostdesk.screen._shared module
test: add comprehensive tests for _logging configuration
test: add comprehensive tests for middleware error handling and coercion
test: add coverage for capture.py _reencode and server.py main function
feat(rulers): add minor ticks every 25px and always show major labels

Assets 2

07 Apr 14:42

maltyxx

v4.1.0

b2618da

v4.1.0

New Features

Base Docker image — Introduced a dedicated base Docker image to separate foundational layers from the application image, improving build times and layer caching
Split CI workflow — CI pipeline now builds base and latest images independently, enabling more granular and efficient deployments
Gnome Keyring support — Added gnome-keyring-daemon to supervisor for secure credential storage within the container

Refactoring

Shared Docker scripts — Moved Docker scripts to a shared directory for better reuse across image variants

Documentation

Custom image guide — Added a dedicated section in the README explaining how to build custom images on top of the base image
SVG logo — Replaced PNG logo with SVG in README for better rendering quality

Maintenance

Cleanup — Removed unused files and updated .dockerignore for a leaner build context

Commits

feat: introduce base Docker image and rework Dockerfiles
feat: split CI workflow for base and latest images
feat: add gnome-keyring daemon to supervisor
refactor: move docker scripts to shared directory
docs: add custom image section to README
docs: use SVG logo instead of PNG in README
chore: remove unused files and update dockerignore

Assets 2

06 Apr 15:36

maltyxx

v4.0.1

fc065ac

v4.0.1

Bug Fixes

Healthcheck reliability — Replaced curl-based healthcheck with supervisorctl status to verify the MCP server process is running. This eliminates false-negative healthchecks caused by HTTP endpoint timing issues during container startup

Documentation

Docker examples improved — Added required environment variables (DISPLAY, RESOLUTION, etc.) to all Docker run/compose examples for easier onboarding
Restart policy — Added restart: unless-stopped to Docker Compose examples for production-ready deployments

Commits

fix: use supervisorctl for healthcheck instead of curl on MCP endpoint
docs: add restart policy to docker examples
docs: add required environment variables to docker examples

Assets 2

06 Apr 14:23

maltyxx

v4.0.0

b5abdc2

v4.0.0

Major Changes

SOM Grounding (Intelligent UI Detection) — Every call to screenshot() now returns structured JSON with every detected UI element (buttons, labels, text fields, links) and their exact (x, y) click coordinates via OCR (RapidOCR + ONNX Runtime). Result: ~90% click accuracy on large LLMs and medium-sized models (~30B parameters)
inspect() tool — Text-only vision — New tool that returns a complete structured view of the screen (elements, windows, cursor, screen dimensions, region) as JSON without sending an image to the LLM. Drastically reduces API costs by eliminating image tokens (~1000+ tokens per screenshot saved)
Visual overlay mode — screenshot(overlay=True) draws colored bounding boxes with (x, y) coordinate labels on every detected element. Ideal for debugging, demos, and visual proof of agent behavior
Region targeting — Both screenshot(region=...) and inspect(region=...) support scoped capture for denser detection on specific screen areas. Coordinates remain absolute — no offset math needed
Redesigned desktop environment — New taskbar with system clock (tint2), easy app switching, polished wallpapers optimized for 1280×800 and 1280×1024
Larger screen: 1280×1024 — Up from 1280×800 (+28% vertical space), giving models significantly more information per screenshot
Small model support — Tested with models as small as 3B active parameters (Qwen3.5-35B-A3B). Built-in optimized instructions guide smaller models effectively
Internationalization — New TZ and LOCALE environment variables for timezone and locale configuration (e.g. Europe/Paris, fr_FR.utf8)
Restructured codebase — Tools reorganized into dedicated modules (screen/, input/, clipboard/, shell/). Added ONNX Runtime dependency for OCR inference

Testing

All unit tests pass
- Screenshot and inspect return correct metadata structure (screen, region, cursor, windows, elements)
- OCR element detection validated with bounding box coordinates
- Overlay rendering tested with label placement

Commits

refactor: restructure tools into dedicated modules and add onnxruntime dependency
feat: regenerate wallpapers from SVG for 1280x800 and 1280x1024 resolutions
feat: SOM grounding integration, desktop environment overhaul and small model prompt
feat: re-enable inspect tool and improve annotation label readability
refactor: simplify small model prompt and clarify inspect() text-only limitation
docs: overhaul README with enterprise workforce section, streamline instructions
docs: move demos after pitch, improve screenshot layout in README
refactor: rename annotate to overlay, unify screenshot and inspect output
docs: update instructions and README for new screenshot/inspect API
chore: remove standalone prompt files
fix: include captured region in metadata for spatial awareness
docs: add region field to JSON example and screenshot docstring

Assets 2

01 Apr 20:18

maltyxx

v3.0.0

8d1da67

v3.0.0

Major Changes

Removed AT-SPI accessibility layer — Models are capable enough to interact with the desktop using screenshots alone. Removed _atspi.py, clickables.py, and system dependencies (python3-gi, gir1.2-atspi-2.0, at-spi2-core, dconf-cli)
New window listing via xdotool — Screenshot now includes open windows with app name, title, and geometry (x, y, width, height)
Standardized API responses — All tools return consistent {"result": ...} format. Screenshot metadata includes cursor position and windows list
Improved performance — Window query runs concurrently with screen capture. Openbox startup extracted to dedicated script with reliable X detection
PROMPT.md — Added system prompt for desktop assistant agents

Testing

All 96 unit tests pass
- Manual testing confirms correct window detection (Firefox, GNOME Terminal)
- Screenshot response properly formatted and validated

Commits

remove: drop AT-SPI accessibility layer and system dependencies
- feat: list open windows via xdotool in screenshot metadata
- refactor: extract Openbox startup into dedicated script
- docs: add PROMPT.md — system prompt for desktop assistant agents

Assets 2

Releases: YV17labs/GhostDesk

v7.2.0

Highlights

Changes

Added

Fixed

Changed

Removed

Documentation

Uh oh!

v7.1.0

Added

Changed

Fixed

Removed

Uh oh!

v7.0.1

Fixed

Uh oh!

v7.0.0

Highlights

Fixed

Full changelog

Uh oh!

v6.0.0

New Features

Refactoring

Performance

Fixes

Documentation

Testing

Uh oh!

v5.0.0

New Features

Refactoring

Container & Environment

Documentation

Testing

Commits

Uh oh!

v4.1.0

New Features

Refactoring

Documentation

Maintenance

Commits

Uh oh!

v4.0.1

Bug Fixes

Documentation

Commits

Uh oh!

v4.0.0

Major Changes

Testing

Commits

Uh oh!

v3.0.0

Major Changes

Testing

Commits

Uh oh!