Skip to content

WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999

Open
grandixximo wants to merge 2 commits into
LinuxCNC:masterfrom
grandixximo:ui-tests
Open

WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999
grandixximo wants to merge 2 commits into
LinuxCNC:masterfrom
grandixximo:ui-tests

Conversation

@grandixximo
Copy link
Copy Markdown
Contributor

@grandixximo grandixximo commented May 4, 2026

Draft, opening for CI feedback. Refs #3756.

Summary

Phase 1 of the GUI test work tracked in #3756. Each test launches a GUI under xvfb-run against an existing configs/sim/<gui>/*.ini, drives Estop reset / machine on / home all via NML, asserts the interpreter reaches IDLE, then shuts down cleanly. Verifies the GUI starts and accepts basic commands without crashing.

Coverage

  • axis
  • touchy
  • gmoccapy
  • qtdragon (qtdragon_xyz/qtdragon_metric.ini)

Mechanics

  • tests/ui-smoke/_lib/launch.sh: xvfb-run wrapper, setsid so the linuxcnc process group can be signalled cleanly, falls back to axis-remote --quit then SIGTERM with grace then SIGKILL. Skips with exit 77 if xvfb-run is unavailable (matches tests/tooledit and tests/pyvcp).
  • tests/ui-smoke/_lib/drive.py: NML driver. Tolerant of sim configs that come up already in STATE_ON via auto-estop-release HAL wiring. Falls back to per-joint serial homing if no HOME_SEQUENCE is configured.
  • tests/ui-smoke/_lib/checkresult.sh: pass when UI_SMOKE_OK printed and no crash markers in captured logs.
  • Reuses existing sim configs, no test-only INI files.

Cleanup discipline

  • .gitignore covers all runtime artifacts (linuxcnc.{out,err,pid}, ui-smoke.{out,err}, result, stderr)
  • 4 consecutive runs locally: 4/4 pass, 0 shmem errors, working tree clean (no untracked files added beyond the committed test scripts). Aligns with the clean-tree gate Bertho asked for and that @hdiethelm is wiring up in CI improvemens: General improvements #3984.

Deps

xvfb is already declared in debian/control with the <!nocheck> profile so apt-get build-dep installs it on the existing CI without a workflow change. Coordinated with @hdiethelm in #3984: this PR adds no system deps; if his lands first, no rebase needed here.

Out of scope (deferred)

  • Phase 2: load a small G-code file via linuxcnc.command.program_open + auto(RUN), verify final position via linuxcnc.stat.position. Per-GUI cross-checks via xdotool or AT-SPI where useful.
  • Phase 3: screenshot or short video on failure, uploaded as CI artifact.

Test plan

  • Local: 4/4 pass under scripts/runtests tests/ui-smoke, no shmem leaks
  • CI: rip-and-test passes
  • Reviewer feedback on scope: shipped smoke ("does it start, NML reachable, no startup crash"), per Bertho's framing in Add tests starting GUIs, likely falling back to xvfb for it #3756. Functional behaviour (load G-code, verify position) tracked as Phase 2 follow-up.

Comment thread tests/ui-smoke/_lib/launch.sh Outdated
Comment thread tests/ui-smoke/_lib/launch.sh Outdated
Comment thread tests/ui-smoke/_lib/launch.sh Outdated
Comment thread tests/ui-smoke/README Outdated
@hdiethelm
Copy link
Copy Markdown
Contributor

Phase 3: screenshot or short video on failure, uploaded as CI artifact.

If you manage to create consistent screenshots and want to go to pedantic mode:

  • Store reference known good screenshots (TBD where, I often use submodules for test data storage so the main repo is not overfilled and it is still tracked)
  • Take screenshots at certain points where everything is static, like before / after homing / at the end
  • Compare to the reference and highlight any differences, fail if there are differences -> Artifact
  • The dev can download the artifacts, check them manually and if the change was on purpose replace the known good ones, so the CI passes again

Probably over complicated and I don't know how deterministic LinuxCNC is but this way, bugs like this #3979 can be easily avoided. Testing manually, these kind of bugs are just often overlooked.

grandixximo added a commit to grandixximo/linuxcnc that referenced this pull request May 4, 2026
Three review-driven changes:

1. Fix self-kill regression: pkill -KILL -f "\\bqtdragon\\b" matched
   the launch.sh process whose argv contained the path
   .../qtdragon_metric.ini, sending SIGKILL to the test itself
   (exit 137 across all 4 tests). Use pkill -KILL -x against an
   exact daemon name list (linuxcncsvr, milltask, halui, rtapi_app),
   not the GUI program names; the GUIs are children of the linuxcnc
   script and get reaped via SIGTERM to its process group.

2. Dedupe cleanup. Both pre-launch and post-shutdown blocks repeated
   the daemon list and shared-memory key list; extract them to
   _lib/cleanup-runtime.sh which is called from launch.sh and from
   the heredoc fallback. Single source of truth.

3. Drop the pre-driver `sleep 8` and the python module preflight
   inside launch.sh. drive.py polls echo_serial_number for task
   readiness so a wall-clock wait is unnecessary. With GUI runtime
   deps now declared in debian/control under !nocheck, the python
   preflight has nothing to do; missing deps will fail the test
   loudly which is what reviewers asked for ("if it skips gracefully
   we don't know whether the code is sane"). The skip predicate
   only skips on xvfb-run absence (rare local dev environment).

Refs LinuxCNC#3756, PR LinuxCNC#3999
@grandixximo
Copy link
Copy Markdown
Contributor Author

Phase 3: screenshot or short video on failure, uploaded as CI artifact.

If you manage to create consistent screenshots and want to go to pedantic mode:

* Store reference known good screenshots (TBD where, I often use submodules for test data storage so the main repo is not overfilled and it is still tracked)

* Take screenshots at certain points where everything is static, like before / after homing / at the end

* Compare to the reference and highlight any differences, fail if there are differences -> Artifact

* The dev can download the artifacts, check them manually and if the change was on purpose replace the known good ones, so the CI passes again

Probably over complicated and I don't know how deterministic LinuxCNC is but this way, bugs like this #3979 can be easily avoided. Testing manually, these kind of bugs are just often overlooked.

The reference-screenshot diff approach is a good Phase 3 idea, will track it on #3756. For Phase 1 (this PR) I'm staying with NML state assertions only since they're deterministic; rendering will need the screen-stabilization tricks you mentioned.

@grandixximo
Copy link
Copy Markdown
Contributor Author

Round 2 pushed; CI now passes 282/282 with all 4 ui-smoke tests running.

Changes in this round:

  • NML connect-and-poll robustness (drive.py): linuxcncsvr's status buffer can be invalid (emcStatusBuffer invalid err=3) for the first ~30s after startup. Driver now retries the connect-and-poll cycle, recreating the stat object each iteration so a stale invalid buffer does not stick. CONNECT_TIMEOUT_S=60s.

  • Software OpenGL (launch.sh): GitHub Actions runners have no GPU and qtdragon's GLcanon widget segfaulted under hardware GL. Set LIBGL_ALWAYS_SOFTWARE=1, GALLIUM_DRIVER=llvmpipe, QT_QUICK_BACKEND=software, QSG_RHI_BACKEND=software, QT_OPENGL=software.

  • pyqt5-dev-tools Build-Depends: qtvcp compiles a QRC file via pyrcc5 at first run; without the package qtdragon segfaulted with "No such file or directory: 'rcc'". Added with <!nocheck> profile.

  • Driver-trust checkresult (checkresult.sh): replaced the crash-marker grep with a simple UI_SMOKE_OK present + UI_SMOKE_FAIL absent check. The driver only prints UI_SMOKE_OK after a successful NML round-trip plus a re-poll after the GUI settle, so it is the authoritative signal. The previous regex was catching shutdown-side Qt teardown races that are out of scope for a startup smoke test.

  • $HOME/linuxcnc/nc_files mkdir workaround: touchy's filechooser.py:29 does os.listdir($HOME/linuxcnc/nc_files) with no try/except and crashes on a clean $HOME. launch.sh pre-creates the directory as a workaround. Filing a separate issue for the underlying touchy bug.

andypugh pushed a commit that referenced this pull request May 10, 2026
filechooser.reload() called os.listdir(self.dir) with no error
handling, which crashes touchy at startup when the hardcoded
$HOME/linuxcnc/nc_files path does not exist (e.g. a fresh install,
a CI runner with a clean $HOME, or a sysadmin who keeps NGC
programs elsewhere). The traceback aborted the whole GUI before any
window appeared.

Catch OSError, log the path that could not be read, and continue
with an empty file list. Touchy still starts; the user can browse
to programs through the regular file menu and the quick-pick list
populates as soon as files appear.

Surfaced by ui-smoke testing (#3999) on a clean GitHub Actions
$HOME. Closes #4005.
@grandixximo grandixximo marked this pull request as ready for review May 11, 2026 10:08
@grandixximo
Copy link
Copy Markdown
Contributor Author

@BsAtHome this is review ready, after the review, merge this and make another PR for phase 2? or just continue here, and merge after all is done?

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 11, 2026

@BsAtHome did you force rerun the CI? I got an email it failed, but looking here it did not? Was it Ubuntu's servers?

Edit:

Never Mind, it failed again, some sudo issues, will check...

Comment thread tests/ui-smoke/.gitignore Outdated
Comment thread tests/ui-smoke/_lib/checkresult.sh Outdated
@BsAtHome
Copy link
Copy Markdown
Contributor

Also,...have you run scripts/shellcheck.sh ...your_scripts...? May find something more ;-)

@BsAtHome
Copy link
Copy Markdown
Contributor

CI keeps failing... did you rebase?

@BsAtHome
Copy link
Copy Markdown
Contributor

did you force rerun the CI? I got an email it failed, but looking here it did not? Was it Ubuntu's servers?

Yes, I did. I saw the fail on eatmydata adduser and was guessing a transient failure (those we've seen many of). As a "member" I can restart runs.

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 11, 2026

I did rebase, also squashed and fixed a few thing, including your review points, shellcheck on my scripts returns clean.
Hopefully this round it passes, I'll keep an eye on it...

@grandixximo
Copy link
Copy Markdown
Contributor Author

grandixximo commented May 11, 2026

@BsAtHome this is not me, it is failing I think cause of something that is fixed with #3984 ??

Edit:

I'll file a tiny PR to quickly patch the CI

@BsAtHome
Copy link
Copy Markdown
Contributor

@BsAtHome this is not me, it is failing I think cause of something that is fixed with #3984 ??

That is not yet merged and my newly created PR #4016 worked fine... I don't understand github's CI docker images... they are intermittently unstable it seems.

@grandixximo
Copy link
Copy Markdown
Contributor Author

will see if #4017 passes

Adds a minimal harness under tests/ui-smoke/ that launches each GUI
against its sim config under xvfb-run and verifies it reaches the
'task ready' NML state without crashing. Auto-discovered by
scripts/runtests via per-GUI test.sh + checkresult + skip files.

Layout:
  _lib/launch.sh        - spawns linuxcnc -r under xvfb, runs driver,
                          handles clean shutdown (group-SIGTERM with
                          60s wait, escalate to SIGKILL + shm cleanup)
  _lib/drive.py         - polls linuxcnc.stat() until task ready,
                          prints UI_SMOKE_OK / UI_SMOKE_FAIL
  _lib/checkresult.sh   - grep for UI_SMOKE_OK / absence of FAIL
  _lib/skip-if-missing.sh - skip when xvfb-run absent (dev env)
  _lib/cleanup-runtime.sh - pre/post belt-and-braces daemon + shm
                            cleanup; SHM key list mirrors
                            scripts/runtests:157 (full 6-key set)
  _lib/run-gui.sh       - dispatcher taking a relpath under
                          configs/sim/, exec'd by per-GUI test.sh
  axis|touchy|gmoccapy|qtdragon/test.sh - one-line wrappers

Force software OpenGL via LIBGL_ALWAYS_SOFTWARE + Qt RHI/QSG/QtQuick
software backends; CI runners have no GPU and Qt GL paths segfault
on headless display.

Skip vs fail policy (BsAtHome / hdiethelm review): only xvfb-run
absence skips; missing Python/typelib deps fail loudly so review
catches them. Required deps are gated under !nocheck in
debian/control.top.in (separate commit).
Adds the Python, Qt, GTK and typelib runtime deps needed for the
ui-smoke harness under tests/ui-smoke/ to actually exercise each
GUI's import path on CI. All gated with <!nocheck> so users building
with DEB_BUILD_OPTIONS=nocheck aren't penalised with the extra
packages.

Includes pyqt5 (+ qsci/qtsvg/qtopengl/qtwebengine/qtpy/dev-tools),
python3-dbus.mainloop.pyqt5, python3-cairo, python3-gi(+cairo),
gir1.2-gtk-3.0, gir1.2-gtksource-4, python3-numpy, python3-configobj,
xvfb and x11-xserver-utils.
@hdiethelm
Copy link
Copy Markdown
Contributor

I just checked a few past CI failures.
Mostly, adduser was missing. Building with sid will need attention from time to time, it's called sid for a reason... ;-)
This one: https://github.com/LinuxCNC/linuxcnc/actions/runs/25674529024/job/75368570073 failed because github being github (Github has a reputation for being unreliable...)

@BsAtHome
Copy link
Copy Markdown
Contributor

I just checked a few past CI failures. Mostly, adduser was missing. Building with sid will need attention from time to time, it's called sid for a reason... ;-)

That was why I merged your PR today (AP usually does the merging). Your improvements fixed the issue, improved speed and structure (very considerably) and replaced another PR that would have generated a conflict with yours. How many birds were hitunharmed with that stone...

(Github has a reputation for being unreliable...)

And it is getting worse by the day.

@hdiethelm
Copy link
Copy Markdown
Contributor

Yes, I hit the not available adduser also in my PR but already fixed it. Just tag me if you have an unexplained CI failure and I can quickly investigate. No I know a bit how github works.

For example this: Error: fatal: unable to access 'https://github.com/LinuxCNC/linuxcnc/': The requested URL returned error: 500 is most probably a github internal API being down.

(Github has a reputation for being unreliable...)

And it is getting worse by the day.

You can follow it live here:
https://damrnelson.github.io/github-historical-uptime/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants