← Back · companion to Eliminating standalone hwregd (convergence) & in-kernel IOKit feasibility
EVFILT_MACHPORT (-16) & delete the pipe-bridgeThe module-era pipe-bridge (task #39 Path B) fakes Mach-event delivery through a self-pipe, and is the root of task #41 (“DISPATCH_SOURCE_TYPE_MACH_RECV does not reliably deliver”) — which forced every daemon onto pthread+timed mach_msg polling. The kernel now has a real EVFILT_MACHPORT filter; this plan moves everything onto it, fixes the launchd attach panic that blocks the first reroute (PR #250), and retires the bridge and the polling workarounds for good.
Root cause The PR #250 boot panic (page fault write 0x70 in filt_machportattach) is a use-after-free on the port set during knote attach: the attach drops the pset lock without holding a reference, then knlist_add writes into pset memory that a concurrent mach_port_move_member / ipc_pset_signal can free & reallocate. Amplified on libdispatch DispatchWorker threads whose per-thread Mach state was never initialized (#148 class). The fix: take a pset reference across knlist_add + revalidate the entry, paired with lazy-init of ith_messages.
In the module era, FreeBSD's kqueue had no Mach-aware filter, so NextBSD shipped task #39 Path B: a "pipe-bridge" that fakes Mach-event delivery through a regular pipe.
register_event_bell / unregister_event_bell).(pset → struct file *) map; when ipc_pset_signal() fires, mach_event_bridge_fire() writes one byte into the pipe.EVFILT_READ, then drain the pset via mach_msg(MACH_RCV_TIMEOUT, 0).This works but is fragile. It is the root of task #41 ("DISPATCH_SOURCE_TYPE_MACH_RECV does not reliably deliver"), which forced consumers (SCNotify, IOKitNotify, mach_service.c, …) to fall back to raw pthread + timed mach_msg(MACH_RCV_TIMEOUT, 500ms) polling loops — high latency, more code, more threads.
The kernel now has a native EVFILT_MACHPORT filter (slot -16), reserved by patches/0003-kqueue-reserve-evfilt-machport.patch and implemented by filt_machport* in ipc_pset.c. The single-threaded probe (#249) proves it delivers. But the full reroute (PR #250) panics under launchd's concurrent DispatchWorker load: page fault write 0x70 in filt_machportattach.
Make all Mach-event delivery use the native EVFILT_MACHPORT filter, delete the pipe-bridge entirely, and retire the per-consumer polling workarounds — fixing task #41 reliability for good.
filt_machport* is the only Mach-event wakeup path; mach_event_bridge.{c,h} and the two *_event_bell syscalls are gone.dispatch_kevent.c translates the libdispatch sentinel EVFILT_MACHPORT (-22) to the native -16 filter on a wrap port set; no pipes, no bell syscalls.pthread+mach_msg loops replaced by dispatch_source_create(DISPATCH_SOURCE_TYPE_MACH_RECV, …); sub-ms latency.filt_machportattach() (/Users/jmaloney/Documents/nextbsd-kernel/src-overlay/sys/compat/mach/ipc/ipc_pset.c:553-577) has a race window between dropping the pset lock and adding the knote:
ips_unlock(pset) releases the pset lock, but holds no reference to keep the pset alive.knlist_add(note, kn, 0) reaches back into pset->ips_note and, via the knlist's sx callbacks, writes to the lock structure embedded in the pset.ipc_pset_move (ipc_pset.c:318-398) — driven by mach_port_move_member from a sibling thread — together with message delivery / ipc_pset_signal can drop the pset's last reference and free/reallocate it inside that window.knlist_add's sx_xlock then writes into freed/reallocated memory at offset ~0x70 (the rcd_io_lock_data mutex inside rpc_common_data) → page fault write 0x70.This is amplified in launchd because libdispatch DispatchWorker threads are spawned by the kernel pthread workqueue, outside mach.ko's task_init_internal path — their per-thread Mach state (struct thread::ith_messages) is uninitialized (the #148 class of bug), so any dereference during attach/detach on those threads is doubly unsafe. The single-threaded probe (#249) never opens the window because it does one attach, one message, one read, exit.
Adopt Option C from the kernel-native survey — take a reference across knlist_add and revalidate that the entry still points at the same pset after the lookup. It is the smallest change that closes both the lifetime hole and the reallocation hole, without holding the knlist sx lock across an allocation path (which Option B risks):
static int
filt_machportattach(struct knote *kn)
{
mach_port_name_t name = (mach_port_name_t)kn->kn_kevent.ident;
ipc_pset_t pset = IPS_NULL;
ipc_entry_t entry;
kern_return_t kr;
struct knlist *note;
kr = ipc_object_translate(current_space(), name, MACH_PORT_RIGHT_PORT_SET,
(ipc_object_t *)&pset);
if (kr != KERN_SUCCESS)
return (kr == KERN_INVALID_NAME ? ENOENT : ENOTSUP);
note = &pset->ips_note;
ips_reference(pset); /* keep pset alive across knlist_add */
ips_unlock(pset);
if ((entry = ipc_entry_lookup(current_space(), name)) == NULL) {
ips_release(pset);
return (ENOENT);
}
if (entry->ie_object != (ipc_object_t)pset) { /* reallocated under us */
ips_release(pset);
return (ENOENT);
}
kn->kn_fp = entry->ie_fp;
knlist_add(note, kn, 0);
ips_release(pset);
return (0);
}
Pair this with the ith_messages lazy-init fix (new ticket, §8) so worker threads that never went through task_init_internal cannot fault during attach/detach or on exit (#148). Both land in the same kernel stage because the panic is only fully closed when both are present.
filt_machportdetach already guards on kn_knlist == NULL (ipc_pset.c:595-596); confirm ipc_pset_destroy still does sx_xlock → knlist_clear → knlist_destroy (ipc_pset.c:438-478) so detach vs destroy ordering is safe in either direction.
**Honest answer: yes, but be precise about *why*, because the surveys disagree on the blame.**
Two distinct claims appeared:
DISPATCH_SOURCE_TYPE_MACH_RECV registers with EV_UDATA_SPECIFIC|EV_DISPATCH|EV_VANISHED, and _dispatch_kevent_merge_ev_flags (event_kevent.c:467-475) clears DU_STATE_ARMED on every EV_DISPATCH delivery, opening a lost-wakeup window between delivery and re-arm.Reconciliation: the native filter sidesteps both, which is what makes it a real fix rather than a transport swap.
EV_ADD | EV_CLEAR (no EV_DISPATCH) at the kernel boundary (dispatch_kevent.c:409-413). The kernel filt_machport re-evaluates ips_active(pset) on every kevent syscall (ipc_pset.c:610-707), so readiness *survives* across polling cycles independent of knote re-arm state — the EV_DISPATCH auto-disarm race never reaches the kernel.dispatch_kevent.c:527-557): bursts collapse to one pending readability event in notify mode, and message mode drains every queued message — no message is lost, only redundant wakeups are merged.What this means for the plan: the existing libdispatch *delivery* code (_dispatch_source_type_mach_recv notify mode, _dispatch_mach_type_recv message mode at event_kevent.c:3314-3379) is already correct for the native filter. We do not rewrite libdispatch's merge logic. We only (a) land the native reroute in libmach, (b) retire the FreeBSD userland polling shim, and (c) verify _dispatch_unote_needs_rearm() does not block on EV_CLEAR-registered sources. The one risk to watch in QA: the EV_DISPATCH disarm path is bypassed *only because* libmach strips it at the kernel boundary — that translation must be preserved exactly.
Legend: FIX = correctness change, MIGRATE = move to native path, DELETE = remove.
/Users/jmaloney/Documents/nextbsd-kernel| File(s) | Current state | Action | Notes |
|---|---|---|---|
src-overlay/sys/compat/mach/ipc/ipc_pset.c:553-577 | filt_machportattach drops pset lock w/o ref → UAF | FIX | Option C: ips_reference across knlist_add + revalidate entry->ie_object == pset. |
src-overlay/sys/compat/mach/ipc/ipc_pset.c:595-596, 438-478 | detach NULL-guard + destroy knlist_clear/destroy | FIX (verify) | Confirm detach-vs-destroy safe both orders; no code change expected. |
mach.ko thread init / ipc_kmsg_delayed_destroy (ith_messages) | worker threads have uninit ith_messages (#148) | FIX | Lazy-init empty queue before deref; lands same stage as attach fix. |
src-overlay/sys/compat/mach/ipc/ipc_pset.c:498-501, 510-515 | weak mach_event_bridge_fire extern + call in ipc_pset_signal | DELETE | Remove last, after native proven. |
src-overlay/sys/compat/mach/mach_event_bridge.c | entire 282-line bridge core | DELETE | Bridge-only, no reuse. |
src-overlay/sys/compat/mach/mach_event_bridge.h | entire 39-line bridge API | DELETE | Bridge-only. |
src-overlay/sys/compat/mach/mach_module.c:51, 150-159 | #include + mach_event_bridge_init() | DELETE | Keep all other module/syscall registration. |
src-overlay/sys/compat/mach/mach_traps.c:30, 337-359 | sys_*_event_bell_trap handlers | DELETE | Keep other Mach traps. |
src-overlay/sys/compat/mach/mach_syscall_wire.c:59-60, 97-100, 316-340, 471-488, 538-542, 626-634, 734-737, 748-753 | bell fwd-decls, guarded wrappers, sysents, offsets, sysctls, wire/unwire | DELETE | Keep all non-bell syscall machinery. |
src-overlay/sys/sys/mach/_mach_sysproto.h:191-203 | register/unregister_event_bell_trap_args | DELETE | Keep other trap arg structs. |
patches/0003-kqueue-reserve-evfilt-machport.patch | reserves slot -16, bumps EVFILT_SYSCOUNT 15→16 | KEEP | This is the native enabler. |
src-overlay/sys/compat/mach/mach_module.c:144-149, ipc_pset.c:489, 504-550, 209-211 | native machport_filtops registration + signal/init/destroy | KEEP | This is the replacement path. |
/Users/jmaloney/Documents/nextbsd-work/src/libmach| File(s) | Current state | Action | Notes |
|---|---|---|---|
dispatch_kevent.c:1-14, 85-102 | comments + EVFILT_MACHPORT_NATIVE (-16) define | MIGRATE | Reflect native arch (already on libmach-native-machport branch, commit 3c7f75a). |
dispatch_kevent.c:148-165 (pipe_r,pipe_w) | pipe fds in struct mach_kev_reg | DELETE | Wrap pset is the knote target. |
dispatch_kevent.c:181-189 reg_find_by_pipe_r_locked | lookup by pipe fd | MIGRATE | → reg_find_by_wrap_pset_locked. |
dispatch_kevent.c:276-323 reg_create | pipe2() + mach_event_bell_register() | MIGRATE | Pset alloc only; native knote in kevent_qos. |
dispatch_kevent.c:338-354 reg_destroy | bell_unregister + close() | MIGRATE | Just pset_deallocate; knote auto-detaches. |
dispatch_kevent.c:362-431 machport_change_translate | emits EVFILT_READ on pipe_r | MIGRATE | Emit EVFILT_MACHPORT_NATIVE on wrap_pset. |
dispatch_kevent.c:527-557 backlog_drain_locked | pipe read-drain loop | MIGRATE | Drop pipe drain; keep message/notify backlog logic verbatim. |
dispatch_kevent.c:775-789 | event demux on filter==EVFILT_READ | MIGRATE | Demux on filter==EVFILT_MACHPORT_NATIVE. |
dispatch_kevent.c:215-228 resolve_mach_syscall | resolves bell syscall nums | DELETE | Bell-only. |
dispatch_kevent.c:230-244 mach_event_bell_register | arms kernel bell | DELETE | Native self-signals. |
dispatch_kevent.c:252-266 mach_event_bell_unregister | disarms bell | DELETE | knote auto-detaches. |
include/mach/dispatch_kevent.h:69-71 | EVFILT_MACHPORT (-22) sentinel | KEEP | Apple-shape; libmach translates to -16. |
include/mach/dispatch_kevent.h:192-200 | bell protos | DELETE (or stub ENOSYS) | Delete if no external callers; grep first. |
backlog/synth (backlog_pop_locked ~565-612, flag mask 116-117) | Apple-shape event synth, DISPATCH_EV_MSG_NEEDS_FREE, ext[0]/ext[1] | KEEP | Delivery contract must not change. |
/Users/jmaloney/Documents/nextbsd-work/src/libdispatch| File(s) | Current state | Action | Notes |
|---|---|---|---|
src/event/event_kevent.c:3314-3328 _dispatch_source_type_mach_recv | notify mode, dst_fflags=0 | KEEP (verify) | Already correct for native. |
src/event/event_kevent.c:3364-3379 _dispatch_mach_type_recv | message mode, dst_fflags=DISPATCH_MACH_RCV_OPTIONS | KEEP (verify) | Already correct. |
src/event/event_kevent.c:467-475 _dispatch_kevent_merge_ev_flags | EV_DISPATCH auto-disarm | KEEP | Bypassed at kernel boundary by libmach's EV_CLEAR-only registration; do not edit. |
src/event/event_mach_freebsd.c:99-230, 233-287 | per-source pthread polling shim (100ms) | DELETE | Replaced by native wakeup once #250 lands green. |
source_internal.h _dispatch_unote_needs_rearm() | re-arm gating | VERIFY | Must not block on EV_CLEAR sources. |
/Users/jmaloney/Documents/nextbsd-work/src| File(s) | Current state | Action | Notes |
|---|---|---|---|
libSystemConfiguration/SCNotify.c:254-292, 316-330, 340-350 | raw pthread + 500ms mach_msg (task #41 workaround, see SCInternal.h:87-88) | DELETE → MIGRATE | Replace with dispatch_source_create(DISPATCH_SOURCE_TYPE_MACH_RECV, notifyPort, …). |
libIOKit/IOKitNotify.c:207-251, 292 | raw pthread + 500ms mach_msg (comments 18-21) | DELETE → MIGRATE | Dispatch source on recv_port; drop atomic_int stop + pthread_t from IONotificationPort. |
IPConfiguration/mach_service.c:50-100, 109 | raw pthread + 500ms mach_msg RPC loop | DELETE → MIGRATE | Dispatch source (or keep if RPC stays trivial). |
IPConfiguration/sc_link_watch.c:157-165, 221 | SCDynamicStoreSetDispatchQueue (rides shim) | KEEP | Auto-fixed when SCNotify migrates. |
libxpc/xpc_connection.c:240-246 | DISPATCH_SOURCE_TYPE_MACH_RECV | KEEP | Already correct; auto-accelerates. |
Libnotify/notify_client.c:943 | DISPATCH_SOURCE_TYPE_MACH_RECV | KEEP | Already correct. |
launchd/src/runtime.c:227-232 (demand_port dispatch source) | dispatch source | KEEP | Correct. |
launchd/src/runtime.c:243, 570-650 kqueue_demand_loop + select() race workaround | secondary safety thread | DELETE (deferred to #168 configd fold) | Keep as safety valve until configd unification; not on the critical path for this plan. |
configd/configd.c:675-723 configd_serve raw mach_msg loop | raw 1s mach_msg | DEFER | Optional dispatch upgrade is part of the configd-unification follow-up, not this plan. |
KernelEventMonitor/kernel_event_monitor.c:246-288 standalone PF_ROUTE daemon | separate daemon | DEFER | Fold into configd in follow-up ticket. |
Cross-repo timing follows the kernel → continuous-release ingest → consumer ordering. Every NextBSD main change goes via PR; a PR never publishes — merging a green PR is the publish event. Each stage gates the next on its CI marker.
test_evfilt_machport_concurrent.c (see §7). Confirm it reproduces page fault write 0x70 against the *current* kernel.EVFILT-MACHPORT-CONCURRENT-FAIL is *expected* here — proves the repro is real before we claim a fix.filt_machportattach + ith_messages lazy-init (#148 class).EVFILT-MACHPORT-OK (single-thread, #249) and flipped EVFILT-MACHPORT-CONCURRENT-OK. Boot test green (configd→ipconfigd→KEM→mDNS→launchctl).libmach-native-machport (commit 3c7f75a) onto the fixed kernel: struct/function migration + delete resolve_mach_syscall, mach_event_bell_register/unregister.EVFILT-MACHPORT-CONCURRENT-OK passes on the ingested kernel.LIBDISPATCH-MACH-OK, LIBDISPATCH-OK, full boot chain, EVFILT-MACHPORT-CONCURRENT-OK under launchd load.event_mach_freebsd.c:99-287 polling threads; verify _dispatch_unote_needs_rearm() on EV_CLEAR.LIBDISPATCH-MACH-OK with the shim gone.mach_msg loops in SCNotify, IOKitNotify, mach_service.c with DISPATCH_SOURCE_TYPE_MACH_RECV.sc_link_watch, libxpc, Libnotify, launchd demand_port are already dispatch-based and auto-benefit.mach_event_bridge.{c,h}; strip bridge bits from mach_module.c, mach_traps.c, mach_syscall_wire.c, _mach_sysproto.h; remove the weak mach_event_bridge_fire extern + call in ipc_pset.c:498-501, 510-515 last.nm/build check that mach_event_bridge_* and *_event_bell symbols no longer exist).| Workaround | Location | Lines | Win |
|---|---|---|---|
SCNotify.__sc_notify_thread | libSystemConfiguration/SCNotify.c | 254-292 (+316-330, 340-350) | ~40 lines + a pthread gone; 500ms→sub-ms SCDynamicStore notifications. |
IOKitNotify.receive_thread | libIOKit/IOKitNotify.c | 207-251 (+292) | ~45 lines + one pthread *per* IONotificationPort; immediate IOKit matching. |
mach_service_thread | IPConfiguration/mach_service.c | 50-100 (+109) | ~50 lines; lower-latency ipconfigd RPC. |
| libdispatch FreeBSD polling shim | libdispatch/src/event/event_mach_freebsd.c | 99-287 | Removes per-source 100ms polling threads across *every* dispatch Mach source. |
kqueue_demand_loop + select() race fix | launchd/src/runtime.c | 243, 570-650 | ~80 lines (deferred to configd-unification follow-up). |
| Pipe-bridge (kernel) | mach_event_bridge.{c,h} + scattered | 282 + 39 + ~150 | Two custom syscalls, a global pset→file map, and a weak wakeup hook gone. |
Net: roughly 135+ lines of userland polling, multiple per-source threads, and the entire kernel bridge (~470+ lines + 2 syscalls) deleted. Latency for Mach-event consumers drops from 100–500ms polling to immediate kernel wakeup.
test_evfilt_machport (#249) — single-thread native delivery → EVFILT-MACHPORT-OK. Necessary but insufficient (never opens the race window).LIBDISPATCH-MACH-OK, LIBDISPATCH-OK; full daemon boot chain.test_libmach, test_mach_port, test_bootstrap, test_busystate, test_waitquiet — all single-threaded/single-task; keep as regression baseline.test_evfilt_machport_concurrent.c (the panic reproducer)Location: src/mach_kmod/tests/test_evfilt_machport_concurrent.c. Marker: EVFILT-MACHPORT-CONCURRENT-OK / -FAIL.
EVFILT_MACHPORT (-16) in notify mode; allocate N=4–8 receive ports.mach_port_move_member(port, pset)mach_port_move_member(port, MACH_PORT_NULL)mach_msg(SEND_TIMEOUT, 0) to a sibling's port (avoid recv-block).kevent(kq, NULL,0, ev,1, {0,0}) every ~10ms to drive filt_machport evaluation; SIGSEGV/page-fault handler flips to -FAIL; pthread_join with 30s timeout.move_member races pset->ips_lock; workqueue DispatchWorker threads with uninit ith_messages arrive mid-kqueue op; the tight kevent poll widens the attach window — exactly the PR #250 signature.test_evfilt_machport_concurrent.c in src/mach_kmod/tests/.tests/boot-test.sh after LIBDISPATCH-MACH-OK, before daemon startup. EVFILT-MACHPORT-OK
EVFILT-MACHPORT-CONCURRENT-OK
EVFILT-MACHPORT-CONCURRENT-OK passes.filt_machportattach concurrency panic.kqueue_demand_loop) is intentionally deferred out of the delivery-unification critical path.ith_messages root class.filt_machportattach fix in Stage 1.test_evfilt_machport_concurrent.c, gate as EVFILT-MACHPORT-CONCURRENT-OK in boot-test.sh; reproduces the PR #250 panic. *Blocks #168, PR #250.*ith_messages for non-task-init threads (#148 fix)" — detect uninit per-thread Mach delayed-destroy queue and init empty before deref in attach/detach and ipc_kmsg_delayed_destroy. *Blocks PR #250; closes #148.*filt_machportattach under concurrent move_member (#168 Stage 1)" — Option C ref+revalidate fix in ipc_pset.c:553-577. *Blocks PR #250.*event_mach_freebsd.c polling shim once native proven (#168 Stage 3)" — delete per-source polling threads; verify _dispatch_unote_needs_rearm on EV_CLEAR. *Depends on PR #250 green.*mach_event_bridge.{c,h} + all bell syscall infra + weak hook in ipc_pset.c. *Final stage; depends on Stages 1–4 stable in CI.**Execution note:* never edit or push freebsd-src; all kernel work is overlay/patch in nextbsd-kernel. Every NextBSD main change goes through a PR; merging a green PR is the publish step; respect kernel → continuous-release ingest → consumer ordering at each stage.
Generated from the evfilt-machport-native-migration-plan workflow (7 agents, ~448k tokens). Companion docs: hwregd convergence, in-kernel IOKit feasibility, Mach syscall slots.