← Back · supersedes the earlier live-ISO, gunion-reroot, and squashfs notes · relates to nextbsd #70 and gershwin-on-freebsd #14/#15

NextBSD live root: a compressed read-only image with a kernel-built writable overlay, and a cpdup install

NextBSD already ships a read-write UFS disk image, and it already has the one piece every live system needs: launchd (PID 1) owns root writability. The live ISO is a generalization of that single step. The recommended shape: the kernel stacks a tmpfs + unionfs overlay over a compressed read-only root before init runs — but only when the booted image declares it is live; launchd stays dumb and just reacts to whether / is writable; and installing is a plain cpdup clone to a rw-UFS target that never unions. No /rescue/init, no chroot, no /sysroot pivot — which, as a bonus, sidesteps the efibootmgr path-mangling bug entirely. This document is the design record.

2026-06-09. Synthesized from two freebsd-src research passes (read-only-root filesystems + boot orchestration) against releng/15.0, the current nextbsd build, and the live gershwin-on-freebsd #14/#15 efibootmgr bug. Citations are file:line in freebsd-src unless noted. Update 2026-06-10: built, booted, and verified — see the box below. The sections that follow are the original design record; where the implementation diverged, the box says how and why.

Implemented & verified — 2026-06-10

A live ISO now boots end-to-end in qemu (UEFI) to a root shell on a writable union, and publishes to nextbsd’s continuous release alongside the disk image (nextbsd #274, kernel #41). The shipped shape is a hybrid of the options below, not the pure in-kernel Option B this document first recommended:

Boot evidence (CI iso-test, qemu -cdrom UEFI): media mounted from /dev/iso9660/NEXTBSD/dev/md1.uzip (geom_uzip taste) → vfs.pivot: / is now unionfs (mounted from /rofs)login: → root shell. Gotchas found & fixed along the way: the vmactions build VM ships an empty /rescue (so the mfsroot is built from the rootfs’s own dynamic tools + their readelf-NEEDED lib closure); mount_cd9660 hard-needs libkiconv.so.4 (absent from the curated base — supplied from the VM); and the next PID 1 needs a fresh devfs on the union’s /dev (the kernel’s auto-devfs is orphaned by the pivot).

The original design in five answers superseded in part — see the verified box above

Contents

  1. Where NextBSD is today
  2. The shaping constraint: no .ko tree (and the modules to bake)
  3. Why mount -uw / can’t work on a compressed root
  4. Axis 1 — the read-only root primitive
  5. Axis 2 — the writable overlay
  6. Axis 3 — where the overlay is established (kernel, Option B)
  7. Live vs installed: how the kernel decides, and why launchd stays dumb
  8. Installing from the live ISO (cpdup → plain rw-UFS)
  9. Tool & bootloader compatibility (efibootmgr, df)
  10. Recommendation & decision matrix
  11. CI for ISO + image, and what gets touched where
  12. Honest caveats & open questions
  13. References

1. Where NextBSD is today

The continuous artifact is a GPT disk image with a read-write UFS root. The build comment is explicit (nextbsd build.sh:2843–2848): “the kernel mounts the freebsd-ufs partition read-only; launchd PID 1 remounts it read-write before starting any daemon. No cd9660, no uzip, no unionfs, no ramdisk pivot.” So the writability handoff already lives in the right place: launchd owns making / writable. Today that is one mount -uw /, which works because the root is plain UFS on a writable partition. Issue #70 asks to also ship a .iso; your deeper goal is the live model underneath it. The two compose: the ISO is the carrier, the overlay is the mechanism, and the install is a clone back to today’s rw-UFS layout.

2. The shaping constraint: no .ko tree (and the modules to bake)

NextBSD ships no loadable module tree; everything must be baked — the rule that just drove the graphics work (nextbsd-kernel #37). The audit is good news: every filesystem this needs has a static config token — none is module-only (sys/conf/options:260–279, the opt_dontuse.h gate).

CapabilityToken to bakeNote
compressed RO image (uzip)device geom_uzipzlib + LZMA free; no GEOM_UZIP_LZMA in 15.0 (files:3755)
tar RO image (modern alt)options TARFSfirst-class RO VFS; FreeBSD’s OCI build uses it (tarfs_vfsops.c:1246)
zstd in eitheroptions ZSTDIOelse .tar.zst/zstd-uzip rejected at mount (g_uzip.c:766)
image carrierdevice md + options MD_ROOTtarfs root also needs MD_ROOT_FSTYPE="tarfs" (default ufs, md.c:175)
stable provider namesoptions GEOM_LABELreference by label, not unit
writable RAM upperoptions TMPFSalso stops mdmfs auto from a doomed kldload("tmpfs") (mdmfs.c:313)
the union overlayoptions UNIONFSstatic option exists (options:278); maturity note in §5
bind subtreesoptions NULLFSpass-through; not copy-up
inner / ISO fsoptions FFS, CD9660UFS inside .uzip; cd9660 envelope

3. Why mount -uw / can’t work on a compressed root

The read-only-ness is enforced at two independent layers, so promoting the root in place is impossible — you must overlay: the GEOM provider returns EROFS on any write open (g_uzip.c:543); the fs forces MNT_RDONLY / rejects MNT_UPDATE (tarfs tarfs_vfsops.c:948,1030; cd9660 inherently RO); and the kernel mounts the first root RO regardless (vfs_mountroot.c:786–792). This is also why, on the live union, launchd must not blindly mount -uw / (§7).

4. Axis 1 — the read-only root primitive

OptionProsCons
geom_uzip of a UFS imageblock-random-access decompress (suits a desktop root); zlib/LZMA/zstd; mature; matches the “uzip” modeltwo layers (GEOM + UFS); mkuzip build; LZMA CPU cost
tarfs (.tar.zst)most modern; trivial tooling (tar); zstd; one layer; FreeBSD OCI uses itsequential decompress — heavier on random reads; needs an overlay regardless
cd9660 (plain ISO)simplest; what every FreeBSD ISO ships (mkisoimages.sh:85)no fs-level compression
md + UFS (mfsBSD)writable root, zero overlayuncompressed in RAM; RAM-bound

Lead geom_uzip of a UFS image for the live desktop root; tarfs-zstd the modern alternative to prototype head-to-head on boot/launch latency.

5. Axis 2 — the writable overlay

OptionVerdict
tmpfs upper + unionfs over / (whole / writable, copy-up, RAM)primary, with care the true live experience; gated by unionfs maturity
disk-backed upper + unionfs (persists across reboot)persistence variant “live USB with save”
tmpfs on /var+/tmp only (root stays RO)safe fallback what FreeBSD ships (rc.d/var:64; rc.d/tmp:43); no unionfs

unionfs honesty Rewritten in the 14.x era and much improved, but still the least-trusted overlay (“USE AT YOUR OWN RISK”; deletes/renames of lower objects need whiteout support on the upper — which tmpfs provides). Use tmpfs as the upper and test your workload. If it proves flaky, the guaranteed-stable design is RO root + tmpfs on /var+/tmp with a few symlinks — FreeBSD’s exact shipped model.

One reassurance: unionfs can be mounted over an already-active / — the “cannot union mount root” rejection only blocks unionfs from being the MNT_ROOTFS mount itself (union_vfsops.c:104). Open files of the running init stay valid (the lower stays reachable through the union), exactly as the kernel’s own vfs_mountroot_shuffle re-covers /dev with devfs vnodes live (vfs_mountroot.c:382–409).

6. Axis 3 — where the overlay is established (kernel, Option B)

The kernel mounts exactly one fs at /, RO, then execs PID 1. The overlay must happen after that. Recommended: the kernel does it, in a small hook in vfs_mountroot() — gated by a kenv the live image declares (§7) — right after the RO lower is mounted and before set_rootvnode finalizes (around vfs_mountroot.c:1086): mount a tmpfs, stack unionfs over the root vnode, re-point rootvnode, reusing the vnode-repointing machinery the kernel already trusts in vfs_mountroot_shuffle (vfs_mountroot.c:303–423). The union must not carry MNT_ROOTFS (the lower keeps it).

Why kernel beats launchd here / is writable before PID 1 is exec’d, so launchd and everything it runs see a writable root — no userland race, no ordering hazards. And because the kernel creates the layers, it can mount the lower and the tmpfs upper MNT_IGNORE, leaving one clean / in every mount-enumerating tool (§9). One audited C path; strictly more modern than rc.initdiskless; no rescue.

Option A (launchd builds the overlay) reaches the same end-state with no kernel patch and is the natural fallback / first step; Option C (a /rescue/init shell) is rejected — unwanted and unnecessary. The launchd capability-probe (§7) is needed in all cases regardless, so it does double duty as Option A’s mechanism and Option B’s safety check. Cost of B: a carried patch in nextbsd-kernel (patches/ + src-overlay/conf/, never a freebsd-src edit) and validating unionfs at root.

7. Live vs installed: how the kernel decides, and why launchd stays dumb

The trigger is declared, not hardcoded

Detecting /dev/cd9660/NextBSD by name is brittle (it misses uzip-on-USB and VirtualBox). Instead, the live image declares its nature in its own loader.conf, and the loader carries that off whatever media it booted:

# /boot/loader.conf — ships ON the live ISO only
vfs.root.mountfrom="cd9660:/dev/iso9660/NEXTBSD"   # or  ufs:/dev/md0.uzip
vfs.root.overlay="tmpfs"                            # ← the kernel hook's trigger

The hook is a one-liner gate: if (kern_getenv("vfs.root.overlay")) build_overlay();. Fully deterministic, one kernel image, every carrier. Belt-and-suspenders: also auto-enable when the root provider is read-only by nature (cd9660/tarfs are VFCF_READONLY; a geom_uzip provider rejects write-opens), so a hand-rolled RO image with no flag still does the sane thing.

The unifying rule

Boot sourceroot providerkernel decision
Live ISO — optical / USB / VirtualBoxcd9660 or uzip (RO)overlay: tmpfs + union → writable live /
Installed system on ada0ufs/ffs (RW)no overlay: plain root
GPT disk-image dd’d to USBufs (RW)no overlay — an installed system on a stick

launchd reacts to state, not identity

launchd still has its root-writability step — but a blind mount -uw / on the live union would fail (unionfs rejects MNT_UPDATE, union_vfsops.c:112–114) and today that’s fatal. The fix needs zero live/installed knowledge — gate on the actual state:

struct statfs sb; statfs("/", &sb);
if (sb.f_flags & MNT_RDONLY)
    mount_uw("/");     // installed UFS (ro-first) → remount rw, as today
// else: already writable (kernel built the union) → do nothing

The runtime “am I live?” signal

For consumers that genuinely need identity (an installer offering to install; services that differ): read back the same declaration — kenv -q vfs.root.overlay (set ⇒ live, empty ⇒ installed). One source of truth: the kenv that triggers the kernel overlay is the userland live flag. Optionally the kernel mirrors it as a read-only kern.live_media sysctl set only when it actually builds the overlay, so userland reads kernel truth. The installed system reads empty because the installer writes a fresh target loader.conf (§8) — it must not cpdup the live one’s overlay line.

8. Installing from the live ISO (cpdup → plain rw-UFS)

The classic, robust live-CD install — and it composes perfectly with the above. The same ISO runs live and installs; the installed system diverges only by two facts the kernel reads at boot (root fstype + the absent overlay flag).

  1. Live system is up with the union / (writable, full toolkit).
  2. gpart the target; newfs a UFS on ada0p?; mount it at /mnt.
  3. cpdup the system tree to /mnt, excluding /dev /mnt /tmp + the upper. Two flavours — your call: clone the merged / (captures live tweaks) or the pristine RO lower (factory-clean; preferred for installs).
  4. Write the target’s plain config: /mnt/etc/fstabufs:/dev/ufs/ROOTFS rw, and /mnt/boot/loader.conf without vfs.root.overlay.
  5. Bootcode + efibootmgr un-chrooted, against /mnt/... (§9) → clean, dedup’d EFI entries.
  6. Reboot → kernel mounts the UFS root, no overlay flag + RW provider → plain installed system, no union, ever.

Consequence to plan for The live ISO is a superset of the installed base. You slimmed UFS userland out of the installed system (newfs/fsck/mount_ufs, 2026-05-27) — but the installer needs them in the live image: gpart, newfs_ffs, fsck_ffs, cpdup, efibootmgr. So build.sh’s ISO path carries an installer-toolkit layer the cloned target doesn’t keep.

9. Tool & bootloader compatibility (efibootmgr, df)

The /sysroot efibootmgr bug — and why our design avoids it for free

Real-world evidence: gershwin-on-freebsd #14/#15, where the live ISO produces mangled EFI entries (ebsd\loader.efi = \EFI\freebsd\loader.efi with the first 8 chars — strlen("/sysroot") — stripped). The cause is efibootmgr.c:1047:

abspath[strlen(abspath) - strlen(relpath) - 1] = '\0';

A raw strlen()-offset prefix strip with no check that the mount is actually a prefix. It only misfires when the global mount table reports a mountpoint with a stale prefix (/sysroot/media/x) that userland’s view (/media/x) lacks — a property of a reroot/pivot-to-/sysroot model. NextBSD’s design has no /sysroot: the kernel mounts root at / and overlays in place (no pivot, no chroot), so rootvnode is / and every mountpoint name is rooted at /. The mount table is structurally incapable of the divergence — so the bug can’t fire, with no patch to efibootmgr (keeping the never-touch-freebsd-src rule clean).

The only remaining variable is the installer’s own efibootmgr call: run it un-chrooted with explicit /mnt/... paths (not inside a target chroot whose view diverges from getfsstat). Going further — making getfsstat(2) report chroot-relative mountpoints — is possible but a deep, broad semantic change; rejected as overkill.

Don’t patch df — use MNT_IGNORE

Because the kernel (Option B) creates the layers, it mounts the lower and upper with MNT_IGNORE — which df already honors by default (bin/df/df.c:258; the flag is literally documented “do not show entry in df”, mount.h:421). Result: every mount-enumerating tool (df, diskarbitrationd, the GUI disk layer) sees one clean /, while MNT_ROOTFS stays on the lower so root-introspection still finds the truth. No tool is patched; the topology is presented honestly-but-cleanly via a standard flag.

Duplicate EFI entries (gershwin #14) — installer logic, separately

Each install adding a fresh BootXXXX is an installer dedup problem, not solved by the overlay or any efibootmgr path fix: before adding, check for an existing entry with the same device-path target (not the FreeBSD label) and skip it; honor intentional parallel installs by matching exact target. Track in the installer, not base.

10. Recommendation & decision matrix

DecisionRecommendedAlternative
RO root primitivegeom_uzip of a UFS imagetarfs-zstd (prototype both)
Writable overlaytmpfs upper + unionfs over /tmpfs on /var+/tmp only (guaranteed-stable)
Where it’s builtkernel vfs_mountroot hook (Option B), MNT_IGNORE layerslaunchd overlay (Option A) as no-patch first step
Live vs installedvfs.root.overlay kenv declared by the live image + RO-provider auto-detect
launchd root stepgate on MNT_RDONLY (skip if writable)
Installcpdup → plain rw-UFS, fresh no-overlay loader.conf, un-chrooted efibootmgr
Artifactsboth: live .iso + the existing rw-UFS GPT image

11. CI for ISO + image, and what gets touched where

RepoChange
nextbsd-kernelbake the §2 FS tokens into config/NEXTBSD (as with #37); add the Option B vfs.root.overlay hook (with MNT_IGNORE layers) as a patches/ + src-overlay/conf/ fragment.
nextbsd (build.sh)mkuzip+ISO the same rootfs; live loader.conf with vfs.root.overlay; the launchd MNT_RDONLY-gated root step; the cpdup installer + toolkit layer; publish + boot-test both. Closes #70 and lands the live model.
CIadd the qemu -cdrom live-ISO boot test (+ cpdup-install smoke) next to the disk-image test; both gate publish.

12. Honest caveats & open questions

13. References

freebsd-src @ releng/15.0. FS option gate sys/conf/options:260–279; tokens :119,138,278. Source gating sys/conf/files:3664–3678,3755–3759. RO enforcement sys/geom/uzip/g_uzip.c:543,766; sys/fs/tarfs/tarfs_vfsops.c:948,1030,1246; tarfs_io.c:30. Root mount sys/kern/vfs_mountroot.c:69–86,303–423,786–792,1078–1155. PID 1/init_path sys/kern/init_main.c:716–797. md preload sys/dev/md/md.c:175,2046–2129; stand/defaults/loader.conf:89–90. unionfs sys/fs/unionfs/union_vfsops.c:104,112–114,270–272. tmpfs/MNT_IGNORE/df sys/sys/mount.h:421; bin/df/df.c:249–258; sbin/mdmfs/mdmfs.c:310–319. Live media release/amd64/mkisoimages.sh:78,85; libexec/rc/rc.d/var:64; rc.d/tmp:43–55; rc.d/root:21–30. efibootmgr bug usr.sbin/efibootmgr/efibootmgr.c:1045–1047. NextBSD build.sh:2843–2910. External: gershwin-on-freebsd #14/#15.

Draft design record, 2026-06-09 — reviewed locally, not pushed. The through-line that held across the whole investigation: launchd owns root writability, the kernel ships everything baked, and nothing pivots — which is what makes the writable live root, the clean df, the cpdup install, and the absence of the /sysroot efibootmgr bug all fall out of the same few decisions.