System Freeze When Chromium Screen-Shares With User Home on NFS (Missing NLM on Server)

Summary

When the user is logged into a session whose home directory is hosted on the NFS share 192.168.1.242:/Local (mounted at /Network), starting a Jitsi screen-share from Chromium causes the entire X11 session (Workspace, WindowManager, Menu, and visually all other X clients) to become unresponsive for tens of seconds at a time. The freeze recovers on its own after a long delay and recurs.

The same hardware, same software build, same Chromium screen-share scenario does not reproduce when the user is logged in as a local account whose home directory lives on local disk.

Root cause: The NFS server (192.168.1.242) does not register the NFS Lock Manager (NLM, RPC program 100021, nlockmgr) with its portmapper. The NFS share is mounted with local_lock=none, hard, timeo=600, which means the kernel will send all POSIX file lock requests to the server via NLM and retry indefinitely on no-reply. Chromium stores its profile in SQLite databases under ~/.config/chromium/, and SQLite uses POSIX byte-range locks for every transaction. When Chromium is actively writing profile state during screen-share, lock acquisition stalls behind RPC retries to a non-existent NLM service on the server. Chromium's main thread blocks in fcntl(), stops servicing X events, and the resulting damage / event backlog at the X server makes every other X client appear frozen until Chromium unblocks.

Reproduction

Log in as a user whose home is on the NFS mount (/Network/Users/<user>).
Launch Chromium.
Join a Jitsi meeting and start a screen-share (any source — whole desktop or a single tab).
Interact with the desktop (open Workspace file viewers, click around, etc.).
Within a short time the entire X session locks up for tens of seconds at a time, recovering on its own and recurring.

Negative control: same steps as a local user whose $HOME is on local disk — no freeze.

Evidence

1. NLM is not registered on the NFS server

$ rpcinfo -p 192.168.1.242
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  56778  status
    100024    1   tcp  41491  status
    100003    3   tcp   2049  nfs

Expected (and missing) entries:

    100021    1   udp  ...  nlockmgr
    100021    3   udp  ...  nlockmgr
    100021    4   udp  ...  nlockmgr
    100021    1   tcp  ...  nlockmgr
    100021    3   tcp  ...  nlockmgr
    100021    4   tcp  ...  nlockmgr

2. Mount is configured to depend on server-side NLM

$ cat /proc/self/mountstats | grep -A1 192.168.1.242
device 192.168.1.242:/Local mounted on /Network with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,
                acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,
                hard,proto=tcp,timeo=600,retrans=2,sec=sys,
                mountaddr=192.168.1.242,mountvers=3,mountport=46312,
                mountproto=udp,local_lock=none

The relevant options are:

local_lock=none — send all POSIX and flock locks to the server via NLM (no client-side fallback).
hard — never give up on a stalled NFS RPC; retry forever.
timeo=600 — wait 60 seconds (deciseconds) before retransmitting.
retrans=2 — before doubling the timeout.

3. Chromium holds many POSIX locks on NFS-hosted SQLite databases

$ sudo cat /proc/locks | grep '00:24'   # device 00:24 = the NFS mount
1: POSIX  ADVISORY  READ  13973 00:24:40127090 1073741826 1073742335
2: POSIX  ADVISORY  READ  13973 00:24:40108440 1073741826 1073742335
3: POSIX  ADVISORY  WRITE 13973 00:24:40126834 0 EOF
4: POSIX  ADVISORY  WRITE 13973 00:24:40127071 1073741824 1073742335
...
(27 such entries, all PID 13973 = chromium main process)

The byte ranges 1073741824 - 1073742335 (0x40000000 - 0x400001FF) are SQLite's lock byte range (Pending byte + Reserved byte + Shared byte range, per SQLite docs). The locked inodes correspond to files like:

$ sudo lsof -p 13973 | grep '\.config/chromium'
chromium  13973  jmaloney  ...  /Network/Users/jmaloney/.config/chromium/segmentation_platform/ukm_db
chromium  13973  jmaloney  ...  /Network/Users/jmaloney/.config/chromium/Default/Web Data
chromium  13973  jmaloney  ...  /Network/Users/jmaloney/.config/chromium/Default/WebStorage/QuotaManager
chromium  13973  jmaloney  ...  /Network/Users/jmaloney/.config/chromium/Default/Account Web Data
chromium  13973  jmaloney  ...  /Network/Users/jmaloney/.config/chromium/Default/DIPS
chromium  13973  jmaloney  ...  /Network/Users/jmaloney/.config/chromium/Default/Network Action Predictor
... (many more SQLite databases)

4. NFS itself is healthy — this is not a network or RPC throughput problem

RPC iostats (per-op, since mount):
              calls   retrans  avg RTT  avg total
   GETATTR:    2528         0   0.34ms     0.61ms
   LOOKUP:      923         0   0.39ms     0.62ms
   ACCESS:      999         0   0.31ms     0.47ms
   READ:       2187         0   0.63ms     0.71ms
   WRITE:       478         0   2.55ms     2.70ms
   READDIRPLUS:  97         0   0.34ms     0.61ms

TCP to NFS server:
   ESTAB 0 0 192.168.1.173:896  192.168.1.242:2049
   rtt: 0.86ms / 0.13ms variance, retransmits: 0, send-Q: 0, recv-Q: 0

Sub-millisecond NFS RPC RTTs, zero retransmits, no socket backlog. The data path is fine; only the lock path is broken.

5. The freeze is X-server-wide, not GNUstep-specific

While the UI is frozen, all GNUstep clients (Workspace, WindowManager, Menu) are sleeping idle in do_sys_poll at single-digit CPU. They have nothing to do because Xorg is not delivering events. The CPU pattern matches Chromium's GPU/renderer/audio processes consuming 80–120% combined, with Xorg blocked in drm_syncobj_array_wait_timeout servicing them. The freeze ends when Chromium unblocks from fcntl() and resumes producing/consuming X events.

6. False leads ruled out during investigation

Hypothesis	Outcome
GWViewerSidebar code (new on `feat/sidebar`)	Innocent. Disabling the sidebar entirely (placeholder NSView) and reverting all sidebar-related changes did not stop the freeze.
Sidebar's redundant `addWatcherForPath:` calls	Innocent. Removing them did not stop the freeze.
NSOutlineView heaviness vs NSTableView	Innocent. Swap to NSTableView did not stop the freeze.
Vertical NSSplitView layout	Innocent. Reverting to horizontal split did not stop the freeze.
Compact path bar mode in GWViewerIconsPath	Innocent. Disabling did not stop the freeze.
GWDesktopManager `volumeMountRoots` refactor	Innocent. Reverting did not stop the freeze.
NFS data-path saturation by Chromium screen-share traffic	Ruled out by mountstats: zero retransmits, sub-ms RTTs.
Sidebar's main-thread NFS lookups (`fileExistsAtPath:` on `~/Documents` etc.)	Real but secondary. Visible during sidebar build but not the trigger of the system-wide freeze.
Stale GNUstep orphan processes from earlier sessions (spinning Menu/WindowManager at 75-87% CPU after session teardown)	Real and worth a separate ticket, but not the cause of the freeze under investigation here. Confirmed by reproducing the freeze in a clean session with no orphans.

Why the freeze is system-wide

Chromium does an fcntl(F_SETLK) on a SQLite database file under ~/.config/chromium/ (NFS).
The kernel routes the lock request through lockd, which RPCs to the server's NLM service. NLM is not registered — lockd retries portmap lookup, retries the RPC, and with hard mount semantics will not give up.
Chromium's calling thread blocks in fcntl() until the lock is granted (which under contention can take seconds to tens of seconds).
Chromium stops processing its X11 event queue and stops producing new frames for the screen-share.
Xorg's compositor / damage queue backs up. Other X11 clients' protocol replies (which depend on the queue draining) stretch out into seconds.
Workspace, WindowManager, and Menu all appear unresponsive, even though their main threads are idle in poll() waiting for X events that are not arriving promptly.
When Chromium eventually unblocks, the X queue drains and the UI snaps back.

The chromium screen-share is the trigger because (a) it dramatically increases per-frame state writes to chromium's profile DBs and (b) it adds heavy GPU/composite work to Xorg, so any stall is amplified.

Suggested fixes

Fix 1 (workaround, fast, reversible): Move Chromium's profile off NFS to local disk. This eliminates the NLM dependency for Chromium specifically and should fully resolve the freeze for screen-sharing scenarios.

# Option A: launch chromium with a local profile dir
chromium --user-data-dir=/var/tmp/chromium-$USER

# Option B: replace the on-NFS dir with a symlink to local disk
mv ~/.config/chromium /var/local/chromium-$USER
ln -s /var/local/chromium-$USER ~/.config/chromium

Use this to confirm the diagnosis before doing server-side work. If the freeze stops with Chromium's profile on local disk, NLM is definitively the cause.

Fix 2 (proper, server-side): Make the NFS server (192.168.1.242) register NLM correctly. The exact steps depend on the server distribution / NAS firmware. On a Linux NFS server:

# On 192.168.1.242:
sudo systemctl enable --now rpc-statd.service
sudo systemctl enable --now nfs-server.service

# Confirm NLM is registered (must show nlockmgr entries):
rpcinfo -p localhost | grep nlockmgr

# Expected output:
#   100021    1   udp  ...  nlockmgr
#   100021    3   udp  ...  nlockmgr
#   100021    4   udp  ...  nlockmgr
#   100021    1   tcp  ...  nlockmgr
#   100021    3   tcp  ...  nlockmgr
#   100021    4   tcp  ...  nlockmgr

If the server is a NAS appliance (Synology, QNAP, TrueNAS, etc.), check the appliance's NFS configuration UI for "NLM" / "lock manager" / "advisory locking" toggles, and check the appliance's running services for nlockmgr / rpc.lockd. After the server is fixed, verify from the client:

$ rpcinfo -p 192.168.1.242 | grep nlockmgr

should now return entries on program 100021.

Fix 3 (mount-side workaround if server cannot be fixed): Add nolock (or equivalently local_lock=all) to the NFS mount options. This makes the client handle locks locally and never contact the server for NLM.

# /etc/fstab entry change:
192.168.1.242:/Local /Network nfs rw,vers=3,hard,nolock,...  0 0

Trade-off: Locks will not be coordinated across hosts mounting the same share. If only one host accesses the share at a time, this is safe. If multiple hosts share /Local and could acquire locks on the same files, this risks data corruption (e.g. SQLite DBs being written by two clients without coordination).

Verification approach

Apply Fix 1 (move chromium profile to local disk) for the affected user.
Log in as that user, start Chromium, start a Jitsi screen-share.
Reproduce the original interaction pattern (open Workspace viewers, click around, etc.) for several minutes.
Confirm no freezes occur. If confirmed, the diagnosis stands.
Schedule Fix 2 (server-side NLM) for a maintenance window. After deployment, verify rpcinfo -p 192.168.1.242 | grep nlockmgr returns entries, then revert Fix 1 and confirm the user can run the same scenario with a profile on NFS without freezing.

Related observations not part of this ticket

Sidebar buildModel does main-thread NFS lookups. When a viewer opens, GWViewerSidebar's buildModel performs synchronous [fm fileExistsAtPath:] on ~/Videos (with ~/Movies fallback) and lazy [FSNode nodeWithPath:] per favorite folder icon (~7-8 NFS getattr RPCs per viewer open). This is not the trigger of the screen-share freeze, but is a real source of latency for users on NFS homes and would benefit from caching or async resolution. Worth a separate small ticket.
GNUstep apps (Workspace, WindowManager, Menu) sometimes fail to exit cleanly and remain as orphans (PPid=1) spinning in a tight userspace loop (epoll_wait + poll + recvmsg-EAGAIN at ~9000 iterations/sec, 75-87% CPU). Observed multiple times during this investigation. Orphans hold their NSMessagePort registrations under /tmp/GNUstepSecure<uid>/NSMessagePort/ports/ and can interfere with subsequent sessions' Distributed Objects lookups. Worth a separate ticket and mitigation in the GNUstep teardown path.

Investigation timeline (abbreviated)

Symptom reported: opening Workspace viewers + clicking sidebar during Chromium/Jitsi screen-share locks up the X session for long periods.
Initial bisect placed blame on feat/sidebar branch vs main. Multiple sidebar-related disables (debounce, redundant watcher removal, NSOutlineView swap, vertical split disable, compact path bar disable, GWDesktopManager refactor revert) all failed to stop the freeze.
Test as a local user (home on local disk) on the same branch — freeze did not reproduce. Strongly implicates NFS interaction.
Live process inspection during a freeze showed Workspace and other GNUstep apps idle in do_sys_poll with no NFS RPCs in flight, while NFS data-path stats were healthy. Pointed at locking rather than data path.
/proc/locks revealed Chromium holding 27+ POSIX locks on NFS-hosted SQLite DBs in classic SQLite locking byte ranges.
rpcinfo -p 192.168.1.242 revealed NLM is not registered on the server.
Combined with mount option local_lock=none, hard, this explains how Chromium's lock operations under screen-share load can stall and cascade into a system-wide UI freeze.