← Back · supersedes the earlier live-ISO, gunion-reroot, and squashfs notes · relates to nextbsd #70 and gershwin-on-freebsd #14/#15
NextBSD already ships a read-write UFS disk image, and it already has the one piece every live system needs: launchd (PID 1) owns root writability. The live ISO is a generalization of that single step. The recommended shape: the kernel stacks a tmpfs + unionfs overlay over a compressed read-only root before init runs — but only when the booted image declares it is live; launchd stays dumb and just reacts to whether / is writable; and installing is a plain cpdup clone to a rw-UFS target that never unions. No /rescue/init, no chroot, no /sysroot pivot — which, as a bonus, sidesteps the efibootmgr path-mangling bug entirely. This document is the design record.
Implemented & verified — 2026-06-10
A live ISO now boots end-to-end in qemu (UEFI) to a root shell on a writable union, and publishes to nextbsd’s continuous release alongside the disk image (nextbsd #274, kernel #41). The shipped shape is a hybrid of the options below, not the pure in-kernel Option B this document first recommended:
rootfs.uzip (geom_uzip over a UFS image) sits on the cd9660 and is decompressed block-by-block on read — the loader preloads only a tiny mfsroot (~7.5 MB), never the multi-GB root. (The 2 GB-into-RAM md_image preload was tried and rejected as too slow / RAM-hungry.)/init (a #!/bin/sh script) does mdconfig -t vnode the uzip → /rofs (RO lower), tmpfs → /cow (writable upper), mount_unionfs, then sysctl vfs.pivot=/rofs and exec /sbin/launchd (PID 1 preserved across exec).vfs_mountroot() that repointed the root with pwd_set_rootvnode(). That only repoints curproc — launchd’s forked services kept the old root and died on dead Mach ports. The fix (and what shipped, #41) is a dedicated vfs.pivot sysctl that adopts an already-mounted union as / using the kernel’s own mountcheckdirs() — the same audited walk over FOREACH_PROC_IN_SYSTEM that kern_reroot() uses — so every process’s root is repointed. This is the Linux pivot_root/switch_root analog FreeBSD lacked.f_mntonname to /, so getfsstat shows <union> on /, the uzip on /rofs, tmpfs on /cow — no /sysroot prefix, so the efibootmgr.c:1047 bug still can’t fire.The original design in five answers superseded in part — see the verified box above
tmpfs + unionfs over the RO image in vfs_mountroot() before PID 1, so / is already writable when launchd starts (§6).loader.conf sets a vfs.root.overlay kenv; the kernel reads it (and falls back to “is the root provider RO?”). The installed system’s loader.conf omits it (§7).mount -uw / break on the union? It would (unionfs rejects MNT_UPDATE) — so launchd is gated on statfs("/").f_flags & MNT_RDONLY: writable → skip, read-only → remount. No live/installed knowledge needed (§7).cpdup the live tree to a freshly newfs’d UFS at /mnt, write a plain (no-overlay) loader.conf, install bootcode + efibootmgr un-chrooted. The installed system is plain rw-UFS, never unions (§8).df, no /sysroot efibootmgr bug? The kernel mounts the layers MNT_IGNORE → one clean /; no pivot → the efibootmgr.c:1047 prefix bug can’t fire (§9)..ko tree (and the modules to bake)mount -uw / can’t work on a compressed rootThe continuous artifact is a GPT disk image with a read-write UFS root. The build comment is explicit (nextbsd build.sh:2843–2848): “the kernel mounts the freebsd-ufs partition read-only; launchd PID 1 remounts it read-write before starting any daemon. No cd9660, no uzip, no unionfs, no ramdisk pivot.” So the writability handoff already lives in the right place: launchd owns making / writable. Today that is one mount -uw /, which works because the root is plain UFS on a writable partition. Issue #70 asks to also ship a .iso; your deeper goal is the live model underneath it. The two compose: the ISO is the carrier, the overlay is the mechanism, and the install is a clone back to today’s rw-UFS layout.
.ko tree (and the modules to bake)NextBSD ships no loadable module tree; everything must be baked — the rule that just drove the graphics work (nextbsd-kernel #37). The audit is good news: every filesystem this needs has a static config token — none is module-only (sys/conf/options:260–279, the opt_dontuse.h gate).
| Capability | Token to bake | Note |
|---|---|---|
| compressed RO image (uzip) | device geom_uzip | zlib + LZMA free; no GEOM_UZIP_LZMA in 15.0 (files:3755) |
| tar RO image (modern alt) | options TARFS | first-class RO VFS; FreeBSD’s OCI build uses it (tarfs_vfsops.c:1246) |
| zstd in either | options ZSTDIO | else .tar.zst/zstd-uzip rejected at mount (g_uzip.c:766) |
| image carrier | device md + options MD_ROOT | tarfs root also needs MD_ROOT_FSTYPE="tarfs" (default ufs, md.c:175) |
| stable provider names | options GEOM_LABEL | reference by label, not unit |
| writable RAM upper | options TMPFS | also stops mdmfs auto from a doomed kldload("tmpfs") (mdmfs.c:313) |
| the union overlay | options UNIONFS | static option exists (options:278); maturity note in §5 |
| bind subtrees | options NULLFS | pass-through; not copy-up |
| inner / ISO fs | options FFS, CD9660 | UFS inside .uzip; cd9660 envelope |
mount -uw / can’t work on a compressed rootThe read-only-ness is enforced at two independent layers, so promoting the root in place is impossible — you must overlay: the GEOM provider returns EROFS on any write open (g_uzip.c:543); the fs forces MNT_RDONLY / rejects MNT_UPDATE (tarfs tarfs_vfsops.c:948,1030; cd9660 inherently RO); and the kernel mounts the first root RO regardless (vfs_mountroot.c:786–792). This is also why, on the live union, launchd must not blindly mount -uw / (§7).
| Option | Pros | Cons |
|---|---|---|
| geom_uzip of a UFS image | block-random-access decompress (suits a desktop root); zlib/LZMA/zstd; mature; matches the “uzip” model | two layers (GEOM + UFS); mkuzip build; LZMA CPU cost |
tarfs (.tar.zst) | most modern; trivial tooling (tar); zstd; one layer; FreeBSD OCI uses it | sequential decompress — heavier on random reads; needs an overlay regardless |
| cd9660 (plain ISO) | simplest; what every FreeBSD ISO ships (mkisoimages.sh:85) | no fs-level compression |
| md + UFS (mfsBSD) | writable root, zero overlay | uncompressed in RAM; RAM-bound |
Lead geom_uzip of a UFS image for the live desktop root; tarfs-zstd the modern alternative to prototype head-to-head on boot/launch latency.
| Option | Verdict |
|---|---|
tmpfs upper + unionfs over / (whole / writable, copy-up, RAM) | primary, with care the true live experience; gated by unionfs maturity |
| disk-backed upper + unionfs (persists across reboot) | persistence variant “live USB with save” |
tmpfs on /var+/tmp only (root stays RO) | safe fallback what FreeBSD ships (rc.d/var:64; rc.d/tmp:43); no unionfs |
unionfs honesty Rewritten in the 14.x era and much improved, but still the least-trusted overlay (“USE AT YOUR OWN RISK”; deletes/renames of lower objects need whiteout support on the upper — which tmpfs provides). Use tmpfs as the upper and test your workload. If it proves flaky, the guaranteed-stable design is RO root + tmpfs on /var+/tmp with a few symlinks — FreeBSD’s exact shipped model.
One reassurance: unionfs can be mounted over an already-active / — the “cannot union mount root” rejection only blocks unionfs from being the MNT_ROOTFS mount itself (union_vfsops.c:104). Open files of the running init stay valid (the lower stays reachable through the union), exactly as the kernel’s own vfs_mountroot_shuffle re-covers /dev with devfs vnodes live (vfs_mountroot.c:382–409).
The kernel mounts exactly one fs at /, RO, then execs PID 1. The overlay must happen after that. Recommended: the kernel does it, in a small hook in vfs_mountroot() — gated by a kenv the live image declares (§7) — right after the RO lower is mounted and before set_rootvnode finalizes (around vfs_mountroot.c:1086): mount a tmpfs, stack unionfs over the root vnode, re-point rootvnode, reusing the vnode-repointing machinery the kernel already trusts in vfs_mountroot_shuffle (vfs_mountroot.c:303–423). The union must not carry MNT_ROOTFS (the lower keeps it).
Why kernel beats launchd here / is writable before PID 1 is exec’d, so launchd and everything it runs see a writable root — no userland race, no ordering hazards. And because the kernel creates the layers, it can mount the lower and the tmpfs upper MNT_IGNORE, leaving one clean / in every mount-enumerating tool (§9). One audited C path; strictly more modern than rc.initdiskless; no rescue.
Option A (launchd builds the overlay) reaches the same end-state with no kernel patch and is the natural fallback / first step; Option C (a /rescue/init shell) is rejected — unwanted and unnecessary. The launchd capability-probe (§7) is needed in all cases regardless, so it does double duty as Option A’s mechanism and Option B’s safety check. Cost of B: a carried patch in nextbsd-kernel (patches/ + src-overlay/conf/, never a freebsd-src edit) and validating unionfs at root.
Detecting /dev/cd9660/NextBSD by name is brittle (it misses uzip-on-USB and VirtualBox). Instead, the live image declares its nature in its own loader.conf, and the loader carries that off whatever media it booted:
# /boot/loader.conf — ships ON the live ISO only
vfs.root.mountfrom="cd9660:/dev/iso9660/NEXTBSD" # or ufs:/dev/md0.uzip
vfs.root.overlay="tmpfs" # ← the kernel hook's trigger
The hook is a one-liner gate: if (kern_getenv("vfs.root.overlay")) build_overlay();. Fully deterministic, one kernel image, every carrier. Belt-and-suspenders: also auto-enable when the root provider is read-only by nature (cd9660/tarfs are VFCF_READONLY; a geom_uzip provider rejects write-opens), so a hand-rolled RO image with no flag still does the sane thing.
| Boot source | root provider | kernel decision |
|---|---|---|
| Live ISO — optical / USB / VirtualBox | cd9660 or uzip (RO) | overlay: tmpfs + union → writable live / |
Installed system on ada0 | ufs/ffs (RW) | no overlay: plain root |
GPT disk-image dd’d to USB | ufs (RW) | no overlay — an installed system on a stick |
launchd still has its root-writability step — but a blind mount -uw / on the live union would fail (unionfs rejects MNT_UPDATE, union_vfsops.c:112–114) and today that’s fatal. The fix needs zero live/installed knowledge — gate on the actual state:
struct statfs sb; statfs("/", &sb);
if (sb.f_flags & MNT_RDONLY)
mount_uw("/"); // installed UFS (ro-first) → remount rw, as today
// else: already writable (kernel built the union) → do nothing
/ writable → MNT_RDONLY clear → launchd skips (and so never hits the union with MNT_UPDATE).MNT_RDONLY set → launchd remounts rw, unchanged.For consumers that genuinely need identity (an installer offering to install; services that differ): read back the same declaration — kenv -q vfs.root.overlay (set ⇒ live, empty ⇒ installed). One source of truth: the kenv that triggers the kernel overlay is the userland live flag. Optionally the kernel mirrors it as a read-only kern.live_media sysctl set only when it actually builds the overlay, so userland reads kernel truth. The installed system reads empty because the installer writes a fresh target loader.conf (§8) — it must not cpdup the live one’s overlay line.
The classic, robust live-CD install — and it composes perfectly with the above. The same ISO runs live and installs; the installed system diverges only by two facts the kernel reads at boot (root fstype + the absent overlay flag).
/ (writable, full toolkit).gpart the target; newfs a UFS on ada0p?; mount it at /mnt.cpdup the system tree to /mnt, excluding /dev /mnt /tmp + the upper. Two flavours — your call: clone the merged / (captures live tweaks) or the pristine RO lower (factory-clean; preferred for installs)./mnt/etc/fstab → ufs:/dev/ufs/ROOTFS rw, and /mnt/boot/loader.conf without vfs.root.overlay.efibootmgr un-chrooted, against /mnt/... (§9) → clean, dedup’d EFI entries.Consequence to plan for The live ISO is a superset of the installed base. You slimmed UFS userland out of the installed system (newfs/fsck/mount_ufs, 2026-05-27) — but the installer needs them in the live image: gpart, newfs_ffs, fsck_ffs, cpdup, efibootmgr. So build.sh’s ISO path carries an installer-toolkit layer the cloned target doesn’t keep.
/sysroot efibootmgr bug — and why our design avoids it for freeReal-world evidence: gershwin-on-freebsd #14/#15, where the live ISO produces mangled EFI entries (ebsd\loader.efi = \EFI\freebsd\loader.efi with the first 8 chars — strlen("/sysroot") — stripped). The cause is efibootmgr.c:1047:
abspath[strlen(abspath) - strlen(relpath) - 1] = '\0';
A raw strlen()-offset prefix strip with no check that the mount is actually a prefix. It only misfires when the global mount table reports a mountpoint with a stale prefix (/sysroot/media/x) that userland’s view (/media/x) lacks — a property of a reroot/pivot-to-/sysroot model. NextBSD’s design has no /sysroot: the kernel mounts root at / and overlays in place (no pivot, no chroot), so rootvnode is / and every mountpoint name is rooted at /. The mount table is structurally incapable of the divergence — so the bug can’t fire, with no patch to efibootmgr (keeping the never-touch-freebsd-src rule clean).
The only remaining variable is the installer’s own efibootmgr call: run it un-chrooted with explicit /mnt/... paths (not inside a target chroot whose view diverges from getfsstat). Going further — making getfsstat(2) report chroot-relative mountpoints — is possible but a deep, broad semantic change; rejected as overkill.
df — use MNT_IGNOREBecause the kernel (Option B) creates the layers, it mounts the lower and upper with MNT_IGNORE — which df already honors by default (bin/df/df.c:258; the flag is literally documented “do not show entry in df”, mount.h:421). Result: every mount-enumerating tool (df, diskarbitrationd, the GUI disk layer) sees one clean /, while MNT_ROOTFS stays on the lower so root-introspection still finds the truth. No tool is patched; the topology is presented honestly-but-cleanly via a standard flag.
Each install adding a fresh BootXXXX is an installer dedup problem, not solved by the overlay or any efibootmgr path fix: before adding, check for an existing entry with the same device-path target (not the FreeBSD label) and skip it; honor intentional parallel installs by matching exact target. Track in the installer, not base.
| Decision | Recommended | Alternative |
|---|---|---|
| RO root primitive | geom_uzip of a UFS image | tarfs-zstd (prototype both) |
| Writable overlay | tmpfs upper + unionfs over / | tmpfs on /var+/tmp only (guaranteed-stable) |
| Where it’s built | kernel vfs_mountroot hook (Option B), MNT_IGNORE layers | launchd overlay (Option A) as no-patch first step |
| Live vs installed | vfs.root.overlay kenv declared by the live image + RO-provider auto-detect | — |
| launchd root step | gate on MNT_RDONLY (skip if writable) | — |
| Install | cpdup → plain rw-UFS, fresh no-overlay loader.conf, un-chrooted efibootmgr | — |
| Artifacts | both: live .iso + the existing rw-UFS GPT image | — |
rootfs/ → two envelopes: makefs -t ffs → GPT disk image (today); mkuzip + makefs -t cd9660 (or mkisoimages.sh) → the live .iso.qemu -cdrom, asserting (a) mount shows unionfs on /, (b) a write to / succeeds, (c) df shows a single clean /, (d) the daemon stack starts. A cpdup-install smoke test (install to a scratch disk, reboot it, confirm plain rw-UFS / no union) is the strong follow-up.| Repo | Change |
|---|---|
| nextbsd-kernel | bake the §2 FS tokens into config/NEXTBSD (as with #37); add the Option B vfs.root.overlay hook (with MNT_IGNORE layers) as a patches/ + src-overlay/conf/ fragment. |
| nextbsd (build.sh) | mkuzip+ISO the same rootfs; live loader.conf with vfs.root.overlay; the launchd MNT_RDONLY-gated root step; the cpdup installer + toolkit layer; publish + boot-test both. Closes #70 and lands the live model. |
| CI | add the qemu -cdrom live-ISO boot test (+ cpdup-install smoke) next to the disk-image test; both gate publish. |
tmpfs-only-/var+/tmp fallback is always there.MNT_ROOTFS and must keep /dev visible; new code to write and test.loader.conf; do not copy the live overlay line./mnt paths.newfs/fsck/gpart/cpdup/efibootmgr on the ISO even though the slim installed base dropped them.freebsd-src @ releng/15.0. FS option gate sys/conf/options:260–279; tokens :119,138,278. Source gating sys/conf/files:3664–3678,3755–3759. RO enforcement sys/geom/uzip/g_uzip.c:543,766; sys/fs/tarfs/tarfs_vfsops.c:948,1030,1246; tarfs_io.c:30. Root mount sys/kern/vfs_mountroot.c:69–86,303–423,786–792,1078–1155. PID 1/init_path sys/kern/init_main.c:716–797. md preload sys/dev/md/md.c:175,2046–2129; stand/defaults/loader.conf:89–90. unionfs sys/fs/unionfs/union_vfsops.c:104,112–114,270–272. tmpfs/MNT_IGNORE/df sys/sys/mount.h:421; bin/df/df.c:249–258; sbin/mdmfs/mdmfs.c:310–319. Live media release/amd64/mkisoimages.sh:78,85; libexec/rc/rc.d/var:64; rc.d/tmp:43–55; rc.d/root:21–30. efibootmgr bug usr.sbin/efibootmgr/efibootmgr.c:1045–1047. NextBSD build.sh:2843–2910. External: gershwin-on-freebsd #14/#15.
Draft design record, 2026-06-09 — reviewed locally, not pushed. The through-line that held across the whole investigation: launchd owns root writability, the kernel ships everything baked, and nothing pivots — which is what makes the writable live root, the clean df, the cpdup install, and the absence of the /sysroot efibootmgr bug all fall out of the same few decisions.