Gershwin WindowManager — QA Plan

A two-track plan: a per-PR manual smoke checklist, and a phased automated test suite.

Repository: gershwin-desktop/gershwin-windowmanager · Drafted: 2026-05-02 · Audience: maintainers & contributors

Context & goals
Manual testing checklist for every PR
Known-state baseline (do this once)
Automated testing — three tiers
CI workflow
Phased rollout (4 milestones)
Feature → test coverage map

1. Context & goals

WindowManager is a single-binary X11 reparenting WM written in Objective-C against GNUstep + libxcb, with an optional XRender compositor. The codebase has grown faster than its safety net: today there are no automated tests, no CI, and only two interactive Xephyr launch scripts (WindowManager/test-with-xephyr.sh, WindowManager/test-xephyr-compositor.sh) that a human watches by eye. Recent regressions (workspace windows wandering after open/close, Chromium scroll-wheel redraws, the paste-to-LoginWindow incident) all slipped through green PRs because there was nothing to catch them.

What this plan optimises for

No bottleneck on the maintainers. The manual checklist must be cheap enough that any contributor can run it on their own branch.
Catch the regressions we’ve already seen at least once. Each known incident becomes a checklist item now and a regression test later.
Move the cost from humans to machines over time. Every checklist item should have a clear path to becoming an automated test, so the manual list shrinks instead of growing forever.

Two tracks, intentionally:

Track A — Manual checklist. A markdown checklist auto-rendered into every PR by a GitHub Action. Authors tick boxes; reviewers can see what was actually exercised. Ships in week 1.
Track B — Automated suite. Three tiers (unit / X11 integration / UI driver) running headless under Xvfb. Built incrementally; every checklist item is a candidate to be retired into automation.

2. Manual testing checklist for every PR

Drop this as .github/PULL_REQUEST_TEMPLATE.md so it appears on every new PR. The author ticks what they exercised; unticked items are not "fail" — they are "untested in this PR", which is itself useful information for the reviewer.

smoke always run, < 2 min total · regression covers a real past bug · feature only if you touched this area · compositor run twice: with and without -dc · optional deeper sweep

2.1 Smoke (always run)

WM starts cleanly under Xephyr from ./test-with-xephyr.sh with no crash and no errors on stderr smoke Catches startup regressions in selection ownership, theme load, signal handlers.
WM starts cleanly under Xephyr from ./test-xephyr-compositor.sh; compositor initialises (no fallback message) smokecompositor Catches XRender/COMPOSITE/DAMAGE/XFIXES init regressions.
Open xterm — it gets a titlebar, the three orb buttons, and 1px border (or 0px in compositor mode) smoke
Open a second window — titlebar of the unfocused window dims (~35% gray overlay) smoke
Click on the unfocused window — focus moves, dimming inverts, _NET_ACTIVE_WINDOW updates (verify with xprop -root _NET_ACTIVE_WINDOW) smoke
Drag a window by the titlebar — it follows the cursor with no tearing or stutter smoke
Resize from each of the four corners — geometry updates correctly, no visual glitch on the corner radius smoke
Click red close orb — window closes (sends WM_DELETE_WINDOW if supported, else destroys) smoke
Click yellow minimize orb — window unmaps; _NET_WM_STATE includes _NET_WM_STATE_HIDDEN smoke
Click green zoom orb — window maximises into the workarea (not over the dock); click again restores smoke
Alt+Tab cycles through windows; Shift+Alt+Tab cycles in reverse; releasing Alt commits focus smoke
Quit the WM (Ctrl+C in the launching shell) — clean exit, no zombie processes, decorations get released smoke

2.2 Regression checks (each ties to a real past bug)

Open and close a Workspace window 5 times in a row at the same spot — subsequent opens stay at the same position (no "wandering") regression Reported by @probonopd after PR #64. Frame-origin / EWMH frame-extents accounting bug class.
In Chromium, scroll wheel inside a page repaints content immediately — no stale tile, no half-painted scroll regressioncompositor Reported as a redraw failure after the EWMH PR. Damage-tracking regression surface.
Run an app that prints to stderr after the WM has closed its log fd — WM does not crash on SIGPIPE regression SIGPIPE handling in main.m.
Toggle CapsLock on, then Alt+Tab — switcher still works (NumLock/CapsLock modifier-mask insensitivity) regression
Right-click on a titlebar to open the snap menu, then drag the cursor off the menu and release — no lockup regression
While dragging a window, kill the client process (e.g. kill -9) — WM does not stay stuck in drag state for the next window regression
Start the WM with no ~/GNUstep/Defaults/uroswm.plist — runs cleanly, doesn't assume compositor preference regression
With the X server under load (or briefly suspended), the WM does not crash on a NULL xcb reply regression Recent fixes in EWMHService.m and the focus-rebuild path.
xprop on a Chromium window shows the full EWMH set: _NET_WM_PID, _NET_WM_WINDOW_TYPE, _NET_WM_NAME, _NET_WM_ALLOWED_ACTIONS, _NET_WM_DESKTOP, _NET_FRAME_EXTENTS regression PR #64's reason for existing — if any of these go missing, we've regressed.
Open Claude / a terminal app, copy text from elsewhere, paste into it — no logoff, no LoginWindow drop regression The selection / focus interaction that bit @pkgdemon last week.

2.3 Feature areas (run only those you touched)

Window decoration / titlebar

Titlebar height matches the active GSTheme; switching theme updates it feature
Window title is centred, antialiased, with shadow; long titles truncate without overflow feature
Titlebar gradient renders top-to-bottom (light gray → dark gray); rounded corners are clean against the wallpaper feature
Resize handle (grow box) is positioned per theme metrics and triggers diagonal resize feature

Resize & move

Resize from each edge (N, S, E, W) and corner (NW, NE, SW, SE) — cursor changes appropriately feature
Window honours WM_NORMAL_HINTS min/max size (test with xterm -geometry 20x5) feature
WM-defined minimum (496×431) is enforced; clients smaller than that get clamped feature
Motion is compressed during drag — CPU does not spike, no event-queue flooding feature

Snapping

Drag to top edge: maximise preview after 300ms linger; release commits maximise feature
Drag to left/right edge: half-screen snap preview; release tiles feature
Drag to each of the four corners: quarter-screen snap preview feature
Right-click titlebar → snap menu shows: Center, Maximize Vert, Maximize Horiz, snap-to-* feature
Snap zones respect dock struts (window does not slide under the dock) feature

Focus management

Closing the focused window of an app with another window open: focus stays in the same app (same-PID preference) feature
Closing the only focused window: focus falls back to the previously-focused window if still mapped feature
With no other regular windows: focus falls back to the Desktop window without crashing feature
X11 focus changes are mirrored into AppKit activation (test with a GNUstep app: menubar should reflect focus) feature
Override-redirect popups (menus, tooltips) do not get decorated and do not steal focus feature
Modal dialogs (_NET_WM_STATE_MODAL) stack above their parent and trap focus feature

EWMH / ICCCM

wmctrl -l lists all managed windows in mapped order feature
wmctrl -d shows one desktop with the workarea correctly reduced by any dock feature
xprop -root _NET_SUPPORTED includes the atoms the WM advertises — spot-check no recent atoms have disappeared feature
Sending _NET_CLOSE_WINDOW via wmctrl -c closes the window gracefully feature
Sending _NET_ACTIVE_WINDOW via wmctrl -a raises and focuses feature
A dock window with _NET_WM_STRUT_PARTIAL reduces _NET_WORKAREA; maximised windows respect it feature

Compositor

Run with and without -dc — both modes start and decorate windows compositor
Open animation (Workspace sets _GERSHWIN_WINDOW_OPEN_ANIMATION_RECT) plays from the source rect to the final window position compositor
Minimize animation shrinks toward the dock area; restore animation expands back compositor
Drag a window — no tearing on the leading edge; titlebar stays crisp compositor
Quickly map and unmap the same window 10 times — no orphan shadow/pixmap leak (check VSZ doesn't grow unboundedly) compositor

Multi-monitor (optional)

If running with two screens (Xephyr +xinerama -screen 1024x768 -screen 1024x768): per-screen workarea correct; windows can be dragged across the boundary optional

Reviewer expectation: the smoke and any regression items overlapping the diff must be ticked before merge. Feature-area items only need ticking when the PR touches that area. A reviewer should never have to re-run smoke themselves — if it isn't ticked, request changes.

3. Known-state baseline (do this once)

Before the checklist is meaningful, we need a public document of what currently works and what doesn’t. Otherwise reviewers will mark items "fail" for pre-existing issues and waste cycles. Create a single living page in the wiki (or a docs/QA-BASELINE.md) with this shape:

Area	Status	Last verified commit	Notes
Titlebar appearance under Eau theme	✓ works	`a419bf0`	—
EWMH atoms on Chromium window	✓ works	`a419bf0`	Fixed by PR #64.
Compositor — tearing during drag	⚠ known issue	`a419bf0`	Visible on fast pointer movement; tracked separately.
Workspace window position stability	⚠ verify	—	Concern raised after PR #64; needs explicit pass before claiming green.
Chromium scroll-wheel repaint	⚠ verify	—	Same source PR; needs explicit pass.
Multi-monitor workarea	❓ untested	—	No CI for it yet.

Update one row per merged PR that touches the area. The table is the contract: if it says "works" at commit X, a reviewer can hold a future PR to that bar.

4. Automated testing — three tiers

A window manager is an inherently graphical, event-driven program with deep state. No single test framework covers it well, so use a layered approach. Each tier costs more to write and run than the one above, so we add tiers in order of payoff.

Tier	Tests	Tools	Runs in	Catches
1. Unit	Pure-ObjC components: atom interning, geometry transforms, MWM/EWMH parsers, focus stack logic, snap-zone math, comparator/transformer utilities	`gnustep-tests` + `ObjectTesting.h` (same as `libs-base/Tests/`)	No X server — `make check`	Logic regressions, NULL-deref guards, off-by-one in geometry math
2. X11 integration	Protocol-level: window gets framed, atoms appear on root and clients, `_NET_ACTIVE_WINDOW` follows focus, `_NET_WORKAREA` shrinks under struts, `WM_DELETE_WINDOW` closes, `_NET_CLOSE_WINDOW` client message works	Bash harness + `Xvfb` + `xprop` + `wmctrl` + `xwininfo`	Headless — `make check-integration` and CI	EWMH compliance regressions, atom drops, frame-extents math, root-property updates
3. UI driver	End-to-end: open windows, drag/resize/close them, exercise Alt-Tab, snap to edges, screenshot diff against a golden image	Python + `xdotool` + `wmctrl` + `scrot` (model: `gershwin-workspace/Tools/uitest`)	Headless — `make check-ui` and CI nightly	Focus/drag/snap regressions, animation breakage, "wandering window" class of bug

4.1 Tier 1 — Unit tests with `gnustep-tests`

Mirror the libs-base/Tests/ layout. Add a Tests/ tree at the repo root:

gershwin-windowmanager/
├── WindowManager/        # production code (existing)
├── Tests/
│   ├── GNUmakefile       # check:: target, runs gnustep-tests
│   ├── unit/
│   │   ├── XCBAtomService_basic.m
│   │   ├── FocusStack_samePidPreference.m
│   │   ├── SnapZone_geometry.m
│   │   ├── EWMHParser_strut.m
│   │   ├── Transformers_coords.m
│   └── integration/         # tier 2 (see 4.2)
└── GNUmakefile           # add Tests to SUBPROJECTS

Example unit test (mirrors the style libs-base uses):

// Tests/unit/FocusStack_samePidPreference.m
#import <Foundation/Foundation.h>
#import "ObjectTesting.h"
#import "URSFocusManager.h"

int main(void) {
  NSAutoreleasePool *p = [NSAutoreleasePool new];
  URSFocusManager *fm = [URSFocusManager new];

  // Three windows, two share a PID.
  [fm trackWindow:0x100 pid:42];
  [fm trackWindow:0x101 pid:42];
  [fm trackWindow:0x102 pid:99];
  [fm setFocusedWindow:0x100];

  // When 0x100 closes, focus should prefer 0x101 (same PID), not 0x102.
  XCBWindowID next = [fm nextFocusCandidateAfterRemoving:0x100];
  PASS(next == 0x101, "same-pid window preferred over other-pid window");

  [p release];
  return 0;
}

What to unit-test first (highest bug-yield, no X server needed):

Focus reassignment policy — the same-PID → previous → desktop → any precedence chain in URSFocusManager.
Snap-zone geometry — given a workarea + cursor position, the right zone is selected for every edge and corner threshold.
Frame-extents math — given a client geometry + titlebar height + border, frame extents and the synthetic ConfigureNotify position are correct.
EWMH/ICCCM parsers — _NET_WM_STRUT_PARTIAL (12 cardinals), _MOTIF_WM_HINTS, WM_NORMAL_HINTS aspect ratio.
Atom service — intern/lookup, GNUstep-specific atoms, no double-intern, no NULL on wedged connection.
NULL-reply guards — reproduce the EWMHService getProperty wedged-connection case (the recent crash) with a stub connection.

Add to Tests/GNUmakefile:

include $(GNUSTEP_MAKEFILES)/common.make

check::
	ADDITIONAL_INCLUDE_DIRS="-I$(CURDIR)/../WindowManager -I$(CURDIR)/../WindowManager/xcb \
	  -I$(CURDIR)/../WindowManager/xcb/services -I$(CURDIR)/../WindowManager/xcb/enums \
	  -I$(CURDIR)/../WindowManager/xcb/utils" \
	gnustep-tests --timeout 60 unit

4.2 Tier 2 — X11 integration tests under Xvfb

These exercise the WM as a black box against a real X server. Each test starts a clean Xvfb on a private display, launches the WM, performs an action (map a test window, send a client message, change a property), and asserts on the resulting X11 state with xprop/wmctrl/xwininfo.

#!/usr/bin/env bash
# Tests/integration/lib.sh
start_wm() {
  Xvfb :99 -screen 0 1024x768x24 &
  XVFB_PID=$!
  export DISPLAY=:99
  sleep 0.3
  ../../WindowManager/obj/WindowManager $@ >wm.log 2>&1 &
  WM_PID=$!
  sleep 0.5
}
stop_wm() { kill $WM_PID 2>/dev/null; kill $XVFB_PID 2>/dev/null; }
trap stop_wm EXIT

assert_atom_present() {
  xprop -id "$1" "$2" 2>/dev/null | grep -q "$2" \
    || { echo "FAIL: $2 missing on $1"; exit 1; }
}

#!/usr/bin/env bash
# Tests/integration/01_ewmh_atoms_on_client.sh
# Regression target: PR #64 — client windows must carry the full EWMH set.
source ./lib.sh
start_wm

xterm -display :99 -e 'sleep 30' &
sleep 0.4
WID=$(xdotool search --class xterm | head -1)

for atom in _NET_WM_PID _NET_WM_WINDOW_TYPE _NET_WM_NAME \
            _NET_WM_ALLOWED_ACTIONS _NET_WM_DESKTOP _NET_FRAME_EXTENTS; do
  assert_atom_present "$WID" "$atom"
done
echo OK

Integration tests to add first (each maps to a known regression):

EWMH atoms appear on client. The test above. Locks in PR #64.
_NET_ACTIVE_WINDOW follows focus. Map two windows; xdotool windowfocus the second; assert root atom updates.
_NET_WORKAREA shrinks under a strut. Map a window with _NET_WM_STRUT_PARTIAL set; assert workarea reduces by the strut amount.
Window position stability across map/unmap. Map at (200, 200), unmap, remap — geometry stays at (200, 200). This catches the “wandering windows” class.
_NET_CLOSE_WINDOW closes the window. wmctrl -c, then assert window count drops.
Wedged-connection survival. Send a malformed property request via raw xcb; assert WM doesn’t crash (process still alive).
WM_S0 takeover. Start a stub WM, then start ours with takeover; assert ours owns WM_S0.

4.3 Tier 3 — UI driver tests

The model is gershwin-workspace/Tools/uitest: a Python suite that drives synthetic input via xdotool, queries state via wmctrl/xwininfo, and captures screenshots via scrot on failure. We do not need the NSConnection IPC piece — for a window manager the X server itself is the oracle.

# Tests/ui/test_drag_preserves_geometry.py
def test_drag_window_lands_at_cursor():
    win = open_xterm()
    move_window(win, 100, 100)
    assert geometry(win) == (100, 100, ...)

    drag_titlebar(win, dx=300, dy=200)
    x, y, _, _ = geometry(win)
    assert abs(x - 400) <= 2 and abs(y - 300) <= 2, \
        f"drag landed at ({x}, {y}), expected ~(400, 300)"

UI scenarios worth automating, in priority order:

Drag → release lands at cursor. Catches motion-compression and frame-offset regressions.
Resize each corner moves the right edges. Catches the 8-direction matrix.
Snap to top maximises into workarea. Drag to (screen_w/2, 0), release after linger.
Alt+Tab cycles & commits focus. Open three windows, Alt+Tab twice, release, assert third window is focused.
Close orb sends WM_DELETE_WINDOW. Click the red orb at the theme-defined coordinates; assert the client received the message (use a stub client that logs).
Compositor open animation. Set _GERSHWIN_WINDOW_OPEN_ANIMATION_RECT on a yet-unmapped window, map it, screenshot at 50ms intervals, assert the window grows from the source rect.
Long-running stability. Loop “open xterm, drag, close” 200 times; assert WM RSS is bounded and no zombie children.

Why not adopt uitest verbatim? It depends on a GNUstep distributed-objects channel into the application under test. WindowManager has no such channel today, and adding one is a non-trivial commitment. For the WM we have a better oracle: the X server. Build the same Python ergonomics (scrot on failure, fixture helpers, clear pass/fail) but back the assertions with wmctrl/xprop/xwininfo output. If we later want pixel-level checks, lift the test_failure_capture.py helper from Workspace.

5. CI workflow

Drop .github/workflows/ci.yml. The same job pattern as libs-base and gershwin-workspace already use; just add the X11 tooling.

name: ci
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        run: |
          sudo apt-get update
          sudo apt-get install -y \
            gnustep-make gnustep-base-runtime libgnustep-base-dev \
            libgnustep-gui-dev libxcb1-dev libxcb-icccm4-dev \
            libxcb-render0-dev libxcb-composite0-dev libxcb-damage0-dev \
            libxcb-xfixes0-dev libxcb-shape0-dev libxcb-keysyms1-dev \
            xvfb xdotool wmctrl x11-utils scrot
      - name: Build
        run: . /usr/share/GNUstep/Makefiles/GNUstep.sh && make
      - name: Unit tests (tier 1)
        run: . /usr/share/GNUstep/Makefiles/GNUstep.sh && make -C Tests check
      - name: Integration tests (tier 2)
        run: |
          . /usr/share/GNUstep/Makefiles/GNUstep.sh
          cd Tests/integration
          for t in *.sh; do bash "$t" || exit 1; done
      - name: UI tests (tier 3) — smoke subset
        run: |
          . /usr/share/GNUstep/Makefiles/GNUstep.sh
          cd Tests/ui
          python3 -m pytest -m smoke
      - name: Upload failure screenshots
        if: failure()
        uses: actions/upload-artifact@v4
        with: { name: ui-failures, path: /tmp/uitest_failures }

Two scopes per PR: the unit + integration + smoke-UI suite (under ~3 minutes, required to pass). Mark the full UI suite as nightly via a separate workflow with schedule: — long stability tests don’t belong on the PR critical path.

Also add a small companion workflow: .github/workflows/checklist.yml that posts the manual checklist as a sticky comment on every new PR, so authors see it inline rather than buried in the PR template.

6. Phased rollout

Week 1

Manual checklist live. Land .github/PULL_REQUEST_TEMPLATE.md with sections 2.1–2.3. Land the baseline doc (section 3) with current verified statuses. Install xserver-xephyr on the maintainer machines so the existing scripts run end-to-end.

Week 2–3

Tier 1 & CI skeleton. Add Tests/ tree, write the five unit test files listed in 4.1, wire make check, land the CI workflow with build + unit only. First green PR using it sets the bar.

Week 4–6

Tier 2 integration. Add Tests/integration/lib.sh and the seven X11 integration tests in 4.2 (each one retires one item from the manual checklist). Required on PR.

Week 7+

Tier 3 UI driver. Stand up Tests/ui/ with the seven scenarios in 4.3. Smoke subset on PR; full suite nightly. As each UI test proves stable, retire the matching manual checklist item.

7. Feature → test coverage map

This is the contract that lets us shrink the manual checklist over time. For every feature area, we know what tier owns it.

Feature area	Source	T1 unit	T2 integration	T3 UI	Manual
Atom interning	xcb/services/XCBAtomService.m	✓	—	—	—
EWMH atom set on root	xcb/services/EWMHService.m	—	✓	—	spot-check
EWMH atom set on client	xcb/services/EWMHService.m	—	✓	—	regr 2.2
Strut → workarea	URSWorkareaManager.m	parser	✓	—	—
Frame extents math	xcb/XCBFrame.m	✓	✓	—	—
Focus same-PID preference	URSFocusManager.m	✓	—	✓	feature
X11 → AppKit activation mirror	URSFocusManager.m	—	—	✓	feature
Drag preserves geometry	XCBConnection.m, XCBFrame.m	—	—	✓	smoke
Resize 8 directions	URSTitlebarController.m	hit-test	—	✓	feature
Snap zones & preview	URSSnapPreviewOverlay.m	✓	—	✓	feature
Snap menu	URSSnappingMenuController.m	—	—	✓	feature + regr
Alt+Tab cycle & commit	URSWindowSwitcher.m, URSKeyboardManager.m	stack ops	—	✓	smoke + regr
Close / minimise / zoom orbs	URSTitlebarController.m	hit-test	WM_DELETE	✓	smoke
Window position stability	XCBConnection.m	—	✓	✓	regr
Compositor init & fallback	URSCompositingManager.m	—	✓	—	smoke
Damage → repaint	URSCompositingManager.m	—	—	✓	regr (Chromium scroll)
Open animation property	XCBConnection.m, URSCompositingManager.m	parse	—	✓	compositor
Wedged-connection survival	EWMHService.m, URSFocusManager.m	✓	✓	—	regr
SIGPIPE / signal handling	main.m	—	✓	—	regr
WM_S0 takeover	XCBSelection.m, URSHybridEventHandler.m	—	✓	—	—

Drafted from a deep audit of /Developer/Library/Sources/gershwin-windowmanager and the surrounding Gershwin ecosystem (libs-base/Tests for the unit-test pattern, gershwin-workspace/Tools/uitest for the UI-driver pattern). Concrete file references in section 7 should be re-verified before each tier’s implementation, since the codebase moves quickly.

Contents