Gershwin WindowManager — QA Plan

A two-track plan: a per-PR manual smoke checklist, and a phased automated test suite.
Repository: gershwin-desktop/gershwin-windowmanager · Drafted: 2026-05-02 · Audience: maintainers & contributors

Contents

  1. Context & goals
  2. Manual testing checklist for every PR
  3. Known-state baseline (do this once)
  4. Automated testing — three tiers
  5. CI workflow
  6. Phased rollout (4 milestones)
  7. Feature → test coverage map

1. Context & goals

WindowManager is a single-binary X11 reparenting WM written in Objective-C against GNUstep + libxcb, with an optional XRender compositor. The codebase has grown faster than its safety net: today there are no automated tests, no CI, and only two interactive Xephyr launch scripts (WindowManager/test-with-xephyr.sh, WindowManager/test-xephyr-compositor.sh) that a human watches by eye. Recent regressions (workspace windows wandering after open/close, Chromium scroll-wheel redraws, the paste-to-LoginWindow incident) all slipped through green PRs because there was nothing to catch them.

What this plan optimises for

Two tracks, intentionally:

2. Manual testing checklist for every PR

Drop this as .github/PULL_REQUEST_TEMPLATE.md so it appears on every new PR. The author ticks what they exercised; unticked items are not "fail" — they are "untested in this PR", which is itself useful information for the reviewer.

smoke always run, < 2 min total · regression covers a real past bug · feature only if you touched this area · compositor run twice: with and without -dc · optional deeper sweep

2.1   Smoke (always run)

2.2   Regression checks (each ties to a real past bug)

2.3   Feature areas (run only those you touched)

Window decoration / titlebar

Resize & move

Snapping

Focus management

EWMH / ICCCM

Compositor

Multi-monitor (optional)

Reviewer expectation: the smoke and any regression items overlapping the diff must be ticked before merge. Feature-area items only need ticking when the PR touches that area. A reviewer should never have to re-run smoke themselves — if it isn't ticked, request changes.

3. Known-state baseline (do this once)

Before the checklist is meaningful, we need a public document of what currently works and what doesn’t. Otherwise reviewers will mark items "fail" for pre-existing issues and waste cycles. Create a single living page in the wiki (or a docs/QA-BASELINE.md) with this shape:

AreaStatusLast verified commitNotes
Titlebar appearance under Eau theme✓ worksa419bf0
EWMH atoms on Chromium window✓ worksa419bf0Fixed by PR #64.
Compositor — tearing during drag⚠ known issuea419bf0Visible on fast pointer movement; tracked separately.
Workspace window position stability⚠ verifyConcern raised after PR #64; needs explicit pass before claiming green.
Chromium scroll-wheel repaint⚠ verifySame source PR; needs explicit pass.
Multi-monitor workarea❓ untestedNo CI for it yet.

Update one row per merged PR that touches the area. The table is the contract: if it says "works" at commit X, a reviewer can hold a future PR to that bar.

4. Automated testing — three tiers

A window manager is an inherently graphical, event-driven program with deep state. No single test framework covers it well, so use a layered approach. Each tier costs more to write and run than the one above, so we add tiers in order of payoff.

TierTestsToolsRuns inCatches
1. Unit Pure-ObjC components: atom interning, geometry transforms, MWM/EWMH parsers, focus stack logic, snap-zone math, comparator/transformer utilities gnustep-tests + ObjectTesting.h (same as libs-base/Tests/) No X server — make check Logic regressions, NULL-deref guards, off-by-one in geometry math
2. X11 integration Protocol-level: window gets framed, atoms appear on root and clients, _NET_ACTIVE_WINDOW follows focus, _NET_WORKAREA shrinks under struts, WM_DELETE_WINDOW closes, _NET_CLOSE_WINDOW client message works Bash harness + Xvfb + xprop + wmctrl + xwininfo Headless — make check-integration and CI EWMH compliance regressions, atom drops, frame-extents math, root-property updates
3. UI driver End-to-end: open windows, drag/resize/close them, exercise Alt-Tab, snap to edges, screenshot diff against a golden image Python + xdotool + wmctrl + scrot (model: gershwin-workspace/Tools/uitest) Headless — make check-ui and CI nightly Focus/drag/snap regressions, animation breakage, "wandering window" class of bug

4.1   Tier 1 — Unit tests with gnustep-tests

Mirror the libs-base/Tests/ layout. Add a Tests/ tree at the repo root:

gershwin-windowmanager/
├── WindowManager/        # production code (existing)
├── Tests/
│   ├── GNUmakefile       # check:: target, runs gnustep-tests
│   ├── unit/
│   │   ├── XCBAtomService_basic.m
│   │   ├── FocusStack_samePidPreference.m
│   │   ├── SnapZone_geometry.m
│   │   ├── EWMHParser_strut.m
│   │   ├── Transformers_coords.m
│   └── integration/         # tier 2 (see 4.2)
└── GNUmakefile           # add Tests to SUBPROJECTS

Example unit test (mirrors the style libs-base uses):

// Tests/unit/FocusStack_samePidPreference.m
#import <Foundation/Foundation.h>
#import "ObjectTesting.h"
#import "URSFocusManager.h"

int main(void) {
  NSAutoreleasePool *p = [NSAutoreleasePool new];
  URSFocusManager *fm = [URSFocusManager new];

  // Three windows, two share a PID.
  [fm trackWindow:0x100 pid:42];
  [fm trackWindow:0x101 pid:42];
  [fm trackWindow:0x102 pid:99];
  [fm setFocusedWindow:0x100];

  // When 0x100 closes, focus should prefer 0x101 (same PID), not 0x102.
  XCBWindowID next = [fm nextFocusCandidateAfterRemoving:0x100];
  PASS(next == 0x101, "same-pid window preferred over other-pid window");

  [p release];
  return 0;
}

What to unit-test first (highest bug-yield, no X server needed):

Add to Tests/GNUmakefile:

include $(GNUSTEP_MAKEFILES)/common.make

check::
	ADDITIONAL_INCLUDE_DIRS="-I$(CURDIR)/../WindowManager -I$(CURDIR)/../WindowManager/xcb \
	  -I$(CURDIR)/../WindowManager/xcb/services -I$(CURDIR)/../WindowManager/xcb/enums \
	  -I$(CURDIR)/../WindowManager/xcb/utils" \
	gnustep-tests --timeout 60 unit

4.2   Tier 2 — X11 integration tests under Xvfb

These exercise the WM as a black box against a real X server. Each test starts a clean Xvfb on a private display, launches the WM, performs an action (map a test window, send a client message, change a property), and asserts on the resulting X11 state with xprop/wmctrl/xwininfo.

#!/usr/bin/env bash
# Tests/integration/lib.sh
start_wm() {
  Xvfb :99 -screen 0 1024x768x24 &
  XVFB_PID=$!
  export DISPLAY=:99
  sleep 0.3
  ../../WindowManager/obj/WindowManager $@ >wm.log 2>&1 &
  WM_PID=$!
  sleep 0.5
}
stop_wm() { kill $WM_PID 2>/dev/null; kill $XVFB_PID 2>/dev/null; }
trap stop_wm EXIT

assert_atom_present() {
  xprop -id "$1" "$2" 2>/dev/null | grep -q "$2" \
    || { echo "FAIL: $2 missing on $1"; exit 1; }
}
#!/usr/bin/env bash
# Tests/integration/01_ewmh_atoms_on_client.sh
# Regression target: PR #64 — client windows must carry the full EWMH set.
source ./lib.sh
start_wm

xterm -display :99 -e 'sleep 30' &
sleep 0.4
WID=$(xdotool search --class xterm | head -1)

for atom in _NET_WM_PID _NET_WM_WINDOW_TYPE _NET_WM_NAME \
            _NET_WM_ALLOWED_ACTIONS _NET_WM_DESKTOP _NET_FRAME_EXTENTS; do
  assert_atom_present "$WID" "$atom"
done
echo OK

Integration tests to add first (each maps to a known regression):

  1. EWMH atoms appear on client. The test above. Locks in PR #64.
  2. _NET_ACTIVE_WINDOW follows focus. Map two windows; xdotool windowfocus the second; assert root atom updates.
  3. _NET_WORKAREA shrinks under a strut. Map a window with _NET_WM_STRUT_PARTIAL set; assert workarea reduces by the strut amount.
  4. Window position stability across map/unmap. Map at (200, 200), unmap, remap — geometry stays at (200, 200). This catches the “wandering windows” class.
  5. _NET_CLOSE_WINDOW closes the window. wmctrl -c, then assert window count drops.
  6. Wedged-connection survival. Send a malformed property request via raw xcb; assert WM doesn’t crash (process still alive).
  7. WM_S0 takeover. Start a stub WM, then start ours with takeover; assert ours owns WM_S0.

4.3   Tier 3 — UI driver tests

The model is gershwin-workspace/Tools/uitest: a Python suite that drives synthetic input via xdotool, queries state via wmctrl/xwininfo, and captures screenshots via scrot on failure. We do not need the NSConnection IPC piece — for a window manager the X server itself is the oracle.

# Tests/ui/test_drag_preserves_geometry.py
def test_drag_window_lands_at_cursor():
    win = open_xterm()
    move_window(win, 100, 100)
    assert geometry(win) == (100, 100, ...)

    drag_titlebar(win, dx=300, dy=200)
    x, y, _, _ = geometry(win)
    assert abs(x - 400) <= 2 and abs(y - 300) <= 2, \
        f"drag landed at ({x}, {y}), expected ~(400, 300)"

UI scenarios worth automating, in priority order:

  1. Drag → release lands at cursor. Catches motion-compression and frame-offset regressions.
  2. Resize each corner moves the right edges. Catches the 8-direction matrix.
  3. Snap to top maximises into workarea. Drag to (screen_w/2, 0), release after linger.
  4. Alt+Tab cycles & commits focus. Open three windows, Alt+Tab twice, release, assert third window is focused.
  5. Close orb sends WM_DELETE_WINDOW. Click the red orb at the theme-defined coordinates; assert the client received the message (use a stub client that logs).
  6. Compositor open animation. Set _GERSHWIN_WINDOW_OPEN_ANIMATION_RECT on a yet-unmapped window, map it, screenshot at 50ms intervals, assert the window grows from the source rect.
  7. Long-running stability. Loop “open xterm, drag, close” 200 times; assert WM RSS is bounded and no zombie children.
Why not adopt uitest verbatim? It depends on a GNUstep distributed-objects channel into the application under test. WindowManager has no such channel today, and adding one is a non-trivial commitment. For the WM we have a better oracle: the X server. Build the same Python ergonomics (scrot on failure, fixture helpers, clear pass/fail) but back the assertions with wmctrl/xprop/xwininfo output. If we later want pixel-level checks, lift the test_failure_capture.py helper from Workspace.

5. CI workflow

Drop .github/workflows/ci.yml. The same job pattern as libs-base and gershwin-workspace already use; just add the X11 tooling.

name: ci
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        run: |
          sudo apt-get update
          sudo apt-get install -y \
            gnustep-make gnustep-base-runtime libgnustep-base-dev \
            libgnustep-gui-dev libxcb1-dev libxcb-icccm4-dev \
            libxcb-render0-dev libxcb-composite0-dev libxcb-damage0-dev \
            libxcb-xfixes0-dev libxcb-shape0-dev libxcb-keysyms1-dev \
            xvfb xdotool wmctrl x11-utils scrot
      - name: Build
        run: . /usr/share/GNUstep/Makefiles/GNUstep.sh && make
      - name: Unit tests (tier 1)
        run: . /usr/share/GNUstep/Makefiles/GNUstep.sh && make -C Tests check
      - name: Integration tests (tier 2)
        run: |
          . /usr/share/GNUstep/Makefiles/GNUstep.sh
          cd Tests/integration
          for t in *.sh; do bash "$t" || exit 1; done
      - name: UI tests (tier 3) — smoke subset
        run: |
          . /usr/share/GNUstep/Makefiles/GNUstep.sh
          cd Tests/ui
          python3 -m pytest -m smoke
      - name: Upload failure screenshots
        if: failure()
        uses: actions/upload-artifact@v4
        with: { name: ui-failures, path: /tmp/uitest_failures }

Two scopes per PR: the unit + integration + smoke-UI suite (under ~3 minutes, required to pass). Mark the full UI suite as nightly via a separate workflow with schedule: — long stability tests don’t belong on the PR critical path.

Also add a small companion workflow: .github/workflows/checklist.yml that posts the manual checklist as a sticky comment on every new PR, so authors see it inline rather than buried in the PR template.

6. Phased rollout

Week 1
Manual checklist live. Land .github/PULL_REQUEST_TEMPLATE.md with sections 2.1–2.3. Land the baseline doc (section 3) with current verified statuses. Install xserver-xephyr on the maintainer machines so the existing scripts run end-to-end.
Week 2–3
Tier 1 & CI skeleton. Add Tests/ tree, write the five unit test files listed in 4.1, wire make check, land the CI workflow with build + unit only. First green PR using it sets the bar.
Week 4–6
Tier 2 integration. Add Tests/integration/lib.sh and the seven X11 integration tests in 4.2 (each one retires one item from the manual checklist). Required on PR.
Week 7+
Tier 3 UI driver. Stand up Tests/ui/ with the seven scenarios in 4.3. Smoke subset on PR; full suite nightly. As each UI test proves stable, retire the matching manual checklist item.

7. Feature → test coverage map

This is the contract that lets us shrink the manual checklist over time. For every feature area, we know what tier owns it.

Feature areaSourceT1 unitT2 integrationT3 UIManual
Atom interningxcb/services/XCBAtomService.m
EWMH atom set on rootxcb/services/EWMHService.mspot-check
EWMH atom set on clientxcb/services/EWMHService.mregr 2.2
Strut → workareaURSWorkareaManager.mparser
Frame extents mathxcb/XCBFrame.m
Focus same-PID preferenceURSFocusManager.mfeature
X11 → AppKit activation mirrorURSFocusManager.mfeature
Drag preserves geometryXCBConnection.m, XCBFrame.msmoke
Resize 8 directionsURSTitlebarController.mhit-testfeature
Snap zones & previewURSSnapPreviewOverlay.mfeature
Snap menuURSSnappingMenuController.mfeature + regr
Alt+Tab cycle & commitURSWindowSwitcher.m, URSKeyboardManager.mstack opssmoke + regr
Close / minimise / zoom orbsURSTitlebarController.mhit-testWM_DELETEsmoke
Window position stabilityXCBConnection.mregr
Compositor init & fallbackURSCompositingManager.msmoke
Damage → repaintURSCompositingManager.mregr (Chromium scroll)
Open animation propertyXCBConnection.m, URSCompositingManager.mparsecompositor
Wedged-connection survivalEWMHService.m, URSFocusManager.mregr
SIGPIPE / signal handlingmain.mregr
WM_S0 takeoverXCBSelection.m, URSHybridEventHandler.m