← Back · companion to Eliminating standalone hwregd (convergence) & in-kernel IOKit feasibility

NextBSD #168 — Unify Mach-event delivery on native EVFILT_MACHPORT (-16) & delete the pipe-bridge

The module-era pipe-bridge (task #39 Path B) fakes Mach-event delivery through a self-pipe, and is the root of task #41 (“DISPATCH_SOURCE_TYPE_MACH_RECV does not reliably deliver”) — which forced every daemon onto pthread+timed mach_msg polling. The kernel now has a real EVFILT_MACHPORT filter; this plan moves everything onto it, fixes the launchd attach panic that blocks the first reroute (PR #250), and retires the bridge and the polling workarounds for good.

2026-06-08 · Synthesized from a 7-agent workflow surveying nextbsd-kernel, nextbsd userland, and read-only freebsd-src (kernel native filter + panic, kernel pipe-bridge, libmach shim, libdispatch, consumers, tests/tickets). Design doc for maintainer review — nothing in stages 1+ is built yet.

Root cause The PR #250 boot panic (page fault write 0x70 in filt_machportattach) is a use-after-free on the port set during knote attach: the attach drops the pset lock without holding a reference, then knlist_add writes into pset memory that a concurrent mach_port_move_member / ipc_pset_signal can free & reallocate. Amplified on libdispatch DispatchWorker threads whose per-thread Mach state was never initialized (#148 class). The fix: take a pset reference across knlist_add + revalidate the entry, paired with lazy-init of ith_messages.

Contents

  1. 1. Executive Summary
  2. 2. Root Cause of the Panic
  3. 3. Does Native Actually Fix Task #41?
  4. 4. Component Change Map
  5. 5. Staged Rollout
  6. 6. Workarounds Retired
  7. 7. Tests
  8. 8. Ticket Updates

1. Executive Summary

The problem

In the module era, FreeBSD's kqueue had no Mach-aware filter, so NextBSD shipped task #39 Path B: a "pipe-bridge" that fakes Mach-event delivery through a regular pipe.

This works but is fragile. It is the root of task #41 ("DISPATCH_SOURCE_TYPE_MACH_RECV does not reliably deliver"), which forced consumers (SCNotify, IOKitNotify, mach_service.c, …) to fall back to raw pthread + timed mach_msg(MACH_RCV_TIMEOUT, 500ms) polling loops — high latency, more code, more threads.

The kernel now has a native EVFILT_MACHPORT filter (slot -16), reserved by patches/0003-kqueue-reserve-evfilt-machport.patch and implemented by filt_machport* in ipc_pset.c. The single-threaded probe (#249) proves it delivers. But the full reroute (PR #250) panics under launchd's concurrent DispatchWorker load: page fault write 0x70 in filt_machportattach.

The goal

Make all Mach-event delivery use the native EVFILT_MACHPORT filter, delete the pipe-bridge entirely, and retire the per-consumer polling workarounds — fixing task #41 reliability for good.

The end state

2. Root Cause of the Panic

Mechanism — UAF on the port set during knote attach

filt_machportattach() (/Users/jmaloney/Documents/nextbsd-kernel/src-overlay/sys/compat/mach/ipc/ipc_pset.c:553-577) has a race window between dropping the pset lock and adding the knote:

  1. Line ~567: ips_unlock(pset) releases the pset lock, but holds no reference to keep the pset alive.
  2. Line ~575: knlist_add(note, kn, 0) reaches back into pset->ips_note and, via the knlist's sx callbacks, writes to the lock structure embedded in the pset.
  3. Concurrently, ipc_pset_move (ipc_pset.c:318-398) — driven by mach_port_move_member from a sibling thread — together with message delivery / ipc_pset_signal can drop the pset's last reference and free/reallocate it inside that window.
  4. knlist_add's sx_xlock then writes into freed/reallocated memory at offset ~0x70 (the rcd_io_lock_data mutex inside rpc_common_data) → page fault write 0x70.

This is amplified in launchd because libdispatch DispatchWorker threads are spawned by the kernel pthread workqueue, outside mach.ko's task_init_internal path — their per-thread Mach state (struct thread::ith_messages) is uninitialized (the #148 class of bug), so any dereference during attach/detach on those threads is doubly unsafe. The single-threaded probe (#249) never opens the window because it does one attach, one message, one read, exit.

The kernel fix (authoritative)

Adopt Option C from the kernel-native survey — take a reference across knlist_add and revalidate that the entry still points at the same pset after the lookup. It is the smallest change that closes both the lifetime hole and the reallocation hole, without holding the knlist sx lock across an allocation path (which Option B risks):

static int
filt_machportattach(struct knote *kn)
{
    mach_port_name_t name = (mach_port_name_t)kn->kn_kevent.ident;
    ipc_pset_t       pset = IPS_NULL;
    ipc_entry_t      entry;
    kern_return_t    kr;
    struct knlist   *note;

    kr = ipc_object_translate(current_space(), name, MACH_PORT_RIGHT_PORT_SET,
                              (ipc_object_t *)&pset);
    if (kr != KERN_SUCCESS)
        return (kr == KERN_INVALID_NAME ? ENOENT : ENOTSUP);

    note = &pset->ips_note;
    ips_reference(pset);          /* keep pset alive across knlist_add */
    ips_unlock(pset);

    if ((entry = ipc_entry_lookup(current_space(), name)) == NULL) {
        ips_release(pset);
        return (ENOENT);
    }
    if (entry->ie_object != (ipc_object_t)pset) {   /* reallocated under us */
        ips_release(pset);
        return (ENOENT);
    }

    kn->kn_fp = entry->ie_fp;
    knlist_add(note, kn, 0);
    ips_release(pset);
    return (0);
}

Pair this with the ith_messages lazy-init fix (new ticket, §8) so worker threads that never went through task_init_internal cannot fault during attach/detach or on exit (#148). Both land in the same kernel stage because the panic is only fully closed when both are present.

filt_machportdetach already guards on kn_knlist == NULL (ipc_pset.c:595-596); confirm ipc_pset_destroy still does sx_xlockknlist_clearknlist_destroy (ipc_pset.c:438-478) so detach vs destroy ordering is safe in either direction.

3. Does Native Actually Fix Task #41?

**Honest answer: yes, but be precise about *why*, because the surveys disagree on the blame.**

Two distinct claims appeared:

Reconciliation: the native filter sidesteps both, which is what makes it a real fix rather than a transport swap.

  1. libmach registers the wrap pset with EV_ADD | EV_CLEAR (no EV_DISPATCH) at the kernel boundary (dispatch_kevent.c:409-413). The kernel filt_machport re-evaluates ips_active(pset) on every kevent syscall (ipc_pset.c:610-707), so readiness *survives* across polling cycles independent of knote re-arm state — the EV_DISPATCH auto-disarm race never reaches the kernel.
  2. Edge-coalescing is handled correctly by libmach's backlog (dispatch_kevent.c:527-557): bursts collapse to one pending readability event in notify mode, and message mode drains every queued message — no message is lost, only redundant wakeups are merged.
  3. Wakeup latency goes from "up to 100–500ms polling" to "immediate on message arrival."

What this means for the plan: the existing libdispatch *delivery* code (_dispatch_source_type_mach_recv notify mode, _dispatch_mach_type_recv message mode at event_kevent.c:3314-3379) is already correct for the native filter. We do not rewrite libdispatch's merge logic. We only (a) land the native reroute in libmach, (b) retire the FreeBSD userland polling shim, and (c) verify _dispatch_unote_needs_rearm() does not block on EV_CLEAR-registered sources. The one risk to watch in QA: the EV_DISPATCH disarm path is bypassed *only because* libmach strips it at the kernel boundary — that translation must be preserved exactly.

4. Component Change Map

Legend: FIX = correctness change, MIGRATE = move to native path, DELETE = remove.

Kernel — /Users/jmaloney/Documents/nextbsd-kernel

File(s)Current stateActionNotes
src-overlay/sys/compat/mach/ipc/ipc_pset.c:553-577filt_machportattach drops pset lock w/o ref → UAFFIXOption C: ips_reference across knlist_add + revalidate entry->ie_object == pset.
src-overlay/sys/compat/mach/ipc/ipc_pset.c:595-596, 438-478detach NULL-guard + destroy knlist_clear/destroyFIX (verify)Confirm detach-vs-destroy safe both orders; no code change expected.
mach.ko thread init / ipc_kmsg_delayed_destroy (ith_messages)worker threads have uninit ith_messages (#148)FIXLazy-init empty queue before deref; lands same stage as attach fix.
src-overlay/sys/compat/mach/ipc/ipc_pset.c:498-501, 510-515weak mach_event_bridge_fire extern + call in ipc_pset_signalDELETERemove last, after native proven.
src-overlay/sys/compat/mach/mach_event_bridge.centire 282-line bridge coreDELETEBridge-only, no reuse.
src-overlay/sys/compat/mach/mach_event_bridge.hentire 39-line bridge APIDELETEBridge-only.
src-overlay/sys/compat/mach/mach_module.c:51, 150-159#include + mach_event_bridge_init()DELETEKeep all other module/syscall registration.
src-overlay/sys/compat/mach/mach_traps.c:30, 337-359sys_*_event_bell_trap handlersDELETEKeep other Mach traps.
src-overlay/sys/compat/mach/mach_syscall_wire.c:59-60, 97-100, 316-340, 471-488, 538-542, 626-634, 734-737, 748-753bell fwd-decls, guarded wrappers, sysents, offsets, sysctls, wire/unwireDELETEKeep all non-bell syscall machinery.
src-overlay/sys/sys/mach/_mach_sysproto.h:191-203register/unregister_event_bell_trap_argsDELETEKeep other trap arg structs.
patches/0003-kqueue-reserve-evfilt-machport.patchreserves slot -16, bumps EVFILT_SYSCOUNT 15→16KEEPThis is the native enabler.
src-overlay/sys/compat/mach/mach_module.c:144-149, ipc_pset.c:489, 504-550, 209-211native machport_filtops registration + signal/init/destroyKEEPThis is the replacement path.

libmach — /Users/jmaloney/Documents/nextbsd-work/src/libmach

File(s)Current stateActionNotes
dispatch_kevent.c:1-14, 85-102comments + EVFILT_MACHPORT_NATIVE (-16) defineMIGRATEReflect native arch (already on libmach-native-machport branch, commit 3c7f75a).
dispatch_kevent.c:148-165 (pipe_r,pipe_w)pipe fds in struct mach_kev_regDELETEWrap pset is the knote target.
dispatch_kevent.c:181-189 reg_find_by_pipe_r_lockedlookup by pipe fdMIGRATEreg_find_by_wrap_pset_locked.
dispatch_kevent.c:276-323 reg_createpipe2() + mach_event_bell_register()MIGRATEPset alloc only; native knote in kevent_qos.
dispatch_kevent.c:338-354 reg_destroybell_unregister + close()MIGRATEJust pset_deallocate; knote auto-detaches.
dispatch_kevent.c:362-431 machport_change_translateemits EVFILT_READ on pipe_rMIGRATEEmit EVFILT_MACHPORT_NATIVE on wrap_pset.
dispatch_kevent.c:527-557 backlog_drain_lockedpipe read-drain loopMIGRATEDrop pipe drain; keep message/notify backlog logic verbatim.
dispatch_kevent.c:775-789event demux on filter==EVFILT_READMIGRATEDemux on filter==EVFILT_MACHPORT_NATIVE.
dispatch_kevent.c:215-228 resolve_mach_syscallresolves bell syscall numsDELETEBell-only.
dispatch_kevent.c:230-244 mach_event_bell_registerarms kernel bellDELETENative self-signals.
dispatch_kevent.c:252-266 mach_event_bell_unregisterdisarms bellDELETEknote auto-detaches.
include/mach/dispatch_kevent.h:69-71EVFILT_MACHPORT (-22) sentinelKEEPApple-shape; libmach translates to -16.
include/mach/dispatch_kevent.h:192-200bell protosDELETE (or stub ENOSYS)Delete if no external callers; grep first.
backlog/synth (backlog_pop_locked ~565-612, flag mask 116-117)Apple-shape event synth, DISPATCH_EV_MSG_NEEDS_FREE, ext[0]/ext[1]KEEPDelivery contract must not change.

libdispatch — /Users/jmaloney/Documents/nextbsd-work/src/libdispatch

File(s)Current stateActionNotes
src/event/event_kevent.c:3314-3328 _dispatch_source_type_mach_recvnotify mode, dst_fflags=0KEEP (verify)Already correct for native.
src/event/event_kevent.c:3364-3379 _dispatch_mach_type_recvmessage mode, dst_fflags=DISPATCH_MACH_RCV_OPTIONSKEEP (verify)Already correct.
src/event/event_kevent.c:467-475 _dispatch_kevent_merge_ev_flagsEV_DISPATCH auto-disarmKEEPBypassed at kernel boundary by libmach's EV_CLEAR-only registration; do not edit.
src/event/event_mach_freebsd.c:99-230, 233-287per-source pthread polling shim (100ms)DELETEReplaced by native wakeup once #250 lands green.
source_internal.h _dispatch_unote_needs_rearm()re-arm gatingVERIFYMust not block on EV_CLEAR sources.

Consumers — /Users/jmaloney/Documents/nextbsd-work/src

File(s)Current stateActionNotes
libSystemConfiguration/SCNotify.c:254-292, 316-330, 340-350raw pthread + 500ms mach_msg (task #41 workaround, see SCInternal.h:87-88)DELETE → MIGRATEReplace with dispatch_source_create(DISPATCH_SOURCE_TYPE_MACH_RECV, notifyPort, …).
libIOKit/IOKitNotify.c:207-251, 292raw pthread + 500ms mach_msg (comments 18-21)DELETE → MIGRATEDispatch source on recv_port; drop atomic_int stop + pthread_t from IONotificationPort.
IPConfiguration/mach_service.c:50-100, 109raw pthread + 500ms mach_msg RPC loopDELETE → MIGRATEDispatch source (or keep if RPC stays trivial).
IPConfiguration/sc_link_watch.c:157-165, 221SCDynamicStoreSetDispatchQueue (rides shim)KEEPAuto-fixed when SCNotify migrates.
libxpc/xpc_connection.c:240-246DISPATCH_SOURCE_TYPE_MACH_RECVKEEPAlready correct; auto-accelerates.
Libnotify/notify_client.c:943DISPATCH_SOURCE_TYPE_MACH_RECVKEEPAlready correct.
launchd/src/runtime.c:227-232 (demand_port dispatch source)dispatch sourceKEEPCorrect.
launchd/src/runtime.c:243, 570-650 kqueue_demand_loop + select() race workaroundsecondary safety threadDELETE (deferred to #168 configd fold)Keep as safety valve until configd unification; not on the critical path for this plan.
configd/configd.c:675-723 configd_serve raw mach_msg loopraw 1s mach_msgDEFEROptional dispatch upgrade is part of the configd-unification follow-up, not this plan.
KernelEventMonitor/kernel_event_monitor.c:246-288 standalone PF_ROUTE daemonseparate daemonDEFERFold into configd in follow-up ticket.

5. Staged Rollout

Cross-repo timing follows the kernel → continuous-release ingest → consumer ordering. Every NextBSD main change goes via PR; a PR never publishes — merging a green PR is the publish event. Each stage gates the next on its CI marker.

Stage 0 — Concurrency repro (no behavior change)

Stage 1 — Kernel fix (the unblocker)

Stage 2 — libmach native reroute (PR #250)

Stage 3 — Retire libdispatch polling shim

Stage 4 — Per-consumer workaround removal (task #41 payoff)

Stage 5 — Delete the pipe-bridge (point of no return)

6. Workarounds Retired

WorkaroundLocationLinesWin
SCNotify.__sc_notify_threadlibSystemConfiguration/SCNotify.c254-292 (+316-330, 340-350)~40 lines + a pthread gone; 500ms→sub-ms SCDynamicStore notifications.
IOKitNotify.receive_threadlibIOKit/IOKitNotify.c207-251 (+292)~45 lines + one pthread *per* IONotificationPort; immediate IOKit matching.
mach_service_threadIPConfiguration/mach_service.c50-100 (+109)~50 lines; lower-latency ipconfigd RPC.
libdispatch FreeBSD polling shimlibdispatch/src/event/event_mach_freebsd.c99-287Removes per-source 100ms polling threads across *every* dispatch Mach source.
kqueue_demand_loop + select() race fixlaunchd/src/runtime.c243, 570-650~80 lines (deferred to configd-unification follow-up).
Pipe-bridge (kernel)mach_event_bridge.{c,h} + scattered282 + 39 + ~150Two custom syscalls, a global pset→file map, and a weak wakeup hook gone.

Net: roughly 135+ lines of userland polling, multiple per-source threads, and the entire kernel bridge (~470+ lines + 2 syscalls) deleted. Latency for Mach-event consumers drops from 100–500ms polling to immediate kernel wakeup.

7. Tests

Existing (keep)

NEW — test_evfilt_machport_concurrent.c (the panic reproducer)

Location: src/mach_kmod/tests/test_evfilt_machport_concurrent.c. Marker: EVFILT-MACHPORT-CONCURRENT-OK / -FAIL.

  1. mach_port_move_member(port, pset)
  2. mach_port_move_member(port, MACH_PORT_NULL)
  3. mach_msg(SEND_TIMEOUT, 0) to a sibling's port (avoid recv-block).

CI integration

  EVFILT-MACHPORT-OK
  EVFILT-MACHPORT-CONCURRENT-OK

8. Ticket Updates

Update — #168 "configd: host KernelEventMonitor in-process"

Update — #249 "test: probe native EVFILT_MACHPORT delivery (Phase A)"

Update — #148 "mach.ko: page fault in ipc_kmsg_destroy on thread exit (libdispatch worker)"

New tickets to file

  1. "test: EVFILT_MACHPORT concurrency stress (#168 Phase B blocker)" — create test_evfilt_machport_concurrent.c, gate as EVFILT-MACHPORT-CONCURRENT-OK in boot-test.sh; reproduces the PR #250 panic. *Blocks #168, PR #250.*
  1. "mach.ko: lazy-init ith_messages for non-task-init threads (#148 fix)" — detect uninit per-thread Mach delayed-destroy queue and init empty before deref in attach/detach and ipc_kmsg_delayed_destroy. *Blocks PR #250; closes #148.*
  1. "mach.ko: fix UAF in filt_machportattach under concurrent move_member (#168 Stage 1)" — Option C ref+revalidate fix in ipc_pset.c:553-577. *Blocks PR #250.*
  1. "libdispatch: retire FreeBSD event_mach_freebsd.c polling shim once native proven (#168 Stage 3)" — delete per-source polling threads; verify _dispatch_unote_needs_rearm on EV_CLEAR. *Depends on PR #250 green.*
  1. "consumers: replace raw pthread+mach_msg loops with DISPATCH_SOURCE_TYPE_MACH_RECV (task #41 payoff, #168 Stage 4)" — SCNotify / IOKitNotify / mach_service.c. *Depends on Stage 3; resolves task #41.*
  1. "kernel: delete task #39 Path B pipe-bridge (#168 Stage 5)" — remove mach_event_bridge.{c,h} + all bell syscall infra + weak hook in ipc_pset.c. *Final stage; depends on Stages 1–4 stable in CI.*
  1. (Follow-up, out of critical path) "configd: fold KernelEventMonitor PF_ROUTE + retire launchd kqueue_demand_loop (#168 original body)" — unify the event loop after delivery is on native filter.

*Execution note:* never edit or push freebsd-src; all kernel work is overlay/patch in nextbsd-kernel. Every NextBSD main change goes through a PR; merging a green PR is the publish step; respect kernel → continuous-release ingest → consumer ordering at each stage.

Generated from the evfilt-machport-native-migration-plan workflow (7 agents, ~448k tokens). Companion docs: hwregd convergence, in-kernel IOKit feasibility, Mach syscall slots.