init_chroot + unionfs/tmpfs full read-write architecture into GershwinBoth branches were implemented, tested on real hardware, and merged
(~May 2026). The installed system now faithfully captures the running live
session — full metadata, hardlinks, in-session pkg installs. But getting there
surfaced three concrete defects in FreeBSD 15's in-kernel unionfs: the first
forced a change to decision D2, and the second shipped as a
known unfixed defect. They are documented here so the next person doesn't
rediscover them from scratch. The plan body below is left as the original (pre-implementation)
plan; this section records where reality diverged.
Reading a lower-layer (uzip) symlink through the union intermittently fails:
readlink() returns EINVAL; lstat() returns
EBADF or a garbage mode/size. It is a kernel bug — every userland tool
(tar, cpio, cp, pax, rsync,
cpdup) is handed the same garbage, because they all read via the same
lstat/readlink syscalls.
What differs is the reaction: tar/libarchive treats it as
fatal and aborts the whole archive — on the live ISO this consistently
truncated a 155k-file system to ~137 files, every run. cpio treats it as a
per-file warning: it skips that entry and continues. So D2's
bsdtar pipe was replaced with find -x / <prunes> | cpio -pdmu
(both base tools), plus a stage-2 symlink-repair sweep that re-walks
find -type l and recreates any symlink cpio dropped (a dedicated
readlink sweep with a short retry reliably reads links that are flaky under
cpio's own access pattern). cpio -pdmu was verified on hardware to preserve
hardlinks (/rescue stays 149-linked, ~19 MB not ~2.9 GB),
schg flags, setuid, ownership, perms and mtime. Status: fixed and merged.
avahi-daemon's startup touches /usr/local/etc/avahi, which triggers
a buggy unionfs copy-up: a corrupt node is left in the tmpfs
upper layer, shadowing the intact directory in the uzip
lower. The corrupt node's stat() reads back inconsistently over
time — a garbage multi-KB regular file, a 19-byte symlink, or a zero-size directory, all with
garbage size/mtime. The installer faithfully copies whatever the union shows it, so an
installed system ends up with a corrupt /usr/local/etc/avahi (a garbage ~34 KB
file instead of the config directory). avahi / mDNS is then misconfigured on installs;
everything else about the install is correct. Reproduced on a clean boot: avahi was a healthy
directory before the install and corrupt after.
cpio pivot was the priority and avahi is one non-critical config directory, so
the branches were merged without the fix.Root cause is understood and the fix is identified (not yet implemented):
the corruption is 100% in the upper layer — rm -rf /usr/local/etc/avahi on the
live union removes the corrupt upper node and the clean uzip lower shows through (verified
in-session). The intended fix is a pre-copy repair step in
installer.sh: before the find | cpio, detect a corrupt upper node
(e.g. a /usr/local/etc child that exists but is not a directory) and
rm -rf it to expose the clean lower, then copy. This keeps full live-session
capture and needs no separate uzip read.
/usr/local/sbin tmpfs shadow — fixedNot a unionfs bug, but it surfaced in the same work and is tool-agnostic.
installer.sh mounts an empty tmpfs over /usr/local/sbin at startup to
suppress devd automounting during partitioning. The normal-install copy walks the live
/, so find walked that empty tmpfs and the installed system
was missing all ~73 binaries there (cupsd, avahi-daemon,
blkid, automount, …). Fixed by unmounting the workaround tmpfs
immediately before the copy — partitioning is done and the target is mounted by then, so devd
is no longer a concern. Status: fixed and merged.
/dev/md0.uzip is
already mounted as the union's lower layer; a second mount fails with FFS
Device busy, and re-mounting the cd9660 to fresh-mdconfig the
backing file was also unreliable. The only readable view of the live session is the union
itself — so the copy must go through it, bugs and all.rsync and cpdup
read via the same lstat/readlink syscalls and get the same kernel
garbage; they behave like cpio (tolerate & continue), not like
tar (abort) — exactly where the cpio pivot already landed. Neither
is in base, so they would add a dependency for no benefit.unionfs symlink
and copy-up behaviours are FreeBSD kernel defects. Failing that, an architecture that
doesn't layer a writable tmpfs over the uzip via unionfs would sidestep both.The live ISO boots via resources/overlays/boot/init_script, which nullfs-mounts
each top-level dir from the uzip over the cd9660 root and layers tmpfs/unionfs piecemeal.
Because that hand-built layout fights the normal rc boot, the script
overwrites /etc/rc.d/{cleanvar,var,cleartmp,tmp,hostname} with neutered
stubs.
The installer (gershwin-system/Library/Scripts/installer.sh) copies /
wholesale with a broken cp -a fallback, so those neutered stubs — plus stale
/var/run cruft — land on installed disks. Diagnosed downstream effect: a stale
/var/run/dshelper.pid is never cleared (the real cleanvar is gone), so
dshelper never starts, and admin (a Directory Services user) can't log in.
The fix: replace the piecemeal init_script with the
init_chroot + in-kernel-unionfs pivot from freebsd-livecd-unionfs — the
whole root becomes one writable union, the real rc.d scripts run normally, and
nothing is neutered. Port the /boot/firmware symlink; split
loader.conf into a live-only tier and an "everywhere" tier; and change the installer
to copy the running live session with bsdtar from the live union
/ (explicit mount-point excludes, no --one-file-system) instead of the
broken cp -a fallback.
root_rw_mount="NO") out of both the uzip and the installer copy. No new live-only
rc.d script; no SystemPrepare.sh hook.init_chroot kenv).init_script runs as a child of
/sbin/init, sets kenv init_chroot=/sysroot, exits; init chroots and
continues. Single-user (boot -s) works natively — init keeps its own argv and honors
-s after the chroot. The freebsd-launchd-mach "$@"-forwarding
fix is only needed for the dual-mode PID-1 design, which we are not using.
bsdtar of the live union root, with explicit excludes — not cp -a, not rsync, no SystemPrepare.sh.Hard requirement: the installer copies the
running live session — in-session pkg installs / config changes must land on the installed
disk — so copying the pristine uzip is out (it drops all of that). cp -a is unfit (no
hardlink preservation → /rescue explodes; no exclude mechanism); rsync isn't
in base. The installer bsdtars straight from /, the live in-kernel unionfs.
It does not use --one-file-system — on a unionfs the st_dev
detection isn't trustworthy; an earlier attempt relied on it, the create-side tar wandered into
/mnt and read the target the extract side was writing, truncating the archive. Instead
every mount point + bit of live-only cruft is named explicitly: ./mnt (the target —
critical), ./dev ./proc ./tmp ./media, the linprocfs/linsysfs/devfs
submounts under ./compat/linux, ./var/{run,tmp,cache}, and
./etc/rc.conf.local (the live-only root_rw_mount override).
bsdtar --acls --xattrs --fflags carries the full metadata set (hardlinks, ownership,
perms, timestamps, setuid/sticky); excluded dirs are recreated empty. Walking the union is a proven
technique (NomadBSD / GhostBSD / mfsBSD do it). Rejected along the way: dd of
the uzip (maintainer wants a file-level tool); bsdtar from a second mount of
/dev/md0.uzip (impossible — FFS Device busy); and bsdtar from
a fresh mdconfig of the uzip (works, but copies the pristine image — drops
live-session changes).
gershwin_live rc.d scriptThe old "Gershwin specific" tweaks go into rc.conf /
sysctl.conf / loader.conf / /etc/rc.local — all identical on live and installed.
Debug aids are dropped or made universal, never is_live()-gated. The one genuinely
live-only setting, root_rw_mount="NO", is written by init_script into the
ephemeral tmpfs upper, so the uzip stays byte-identical.
The old init_script quietly set the live
ISO's hostname from the SMBIOS product name — that's dropped; both live and installed use
hostname="gershwin". conscontrol mute on becomes universal via
kern.consmute=1 in the Tier-2 loader.conf.d/gershwin.conf — full quiet
boot on live and installed, no "1 line of text" on either. Result: no
is_live()-gated behavior anywhere. The only live-only setting at all,
root_rw_mount="NO", is written to the tmpfs upper by init_script and
never touches the uzip.
CD_ROOT by
prepare_boot_env(): init_script, the Tier-1 loader.conf,
the /sbin/init + /boot/firmware symlinks, /sysroot,
/upper, /rescue. Kernel's real root, hidden from the post-pivot
chroot, not part of the uzip — never reaches an install.RELEASE_DIR: the actual system, the
lower layer of the live union. Everything that should reach installs lives here (the installer
copies the live union /, whose lower layer this is — see D2).init_script writes
root_rw_mount="NO" into /etc/rc.conf.local here; the installer excludes
that one file so it never reaches an install.Tier 1 — live-only, stays in cd9660 loader.conf:
init_script, init_shell, unionfs_load,
geom_uzip_load, geom_ventoy_load, vfs.root.mountfrom=cd9660:…,
vfs.mountroot.timeout, vm.kmem_size* (LiveCD low-RAM ARC cap — harmful on
installs), kern.geom.label.debug, loader_conf_dirs="/boot/loader.conf.d".
Tier 2 — everywhere, new loader.conf.d/gershwin.conf (baked into
RELEASE_DIR/boot/loader.conf.d/ → uzip → installs, read via the base-default
loader_conf_dirs; also staged onto cd9660 for the live loader): quiet boot
(boot_mute, kern.vt.color.*, splash=""),
beastie_disable, autoboot_delay, screen.font,
hw.vga.textmode, screen.textmode, the hw.usb.quirk.*
SD-card-reader block, hw.psm.* touchpad tap, hint.pcm.*.eq,
hw.syscons.bell=0, hw.usb.no_shutdown_wait, dev.bge.0.msi=0,
firewire_load, aio_load, kern.consmute=1
(full console mute — replaces the old runtime conscontrol mute on; falls back to
/etc/sysctl.conf if not loader-tunable). The separate loader.mute.d/ dir
is merged in here and removed.
All configure_system sysrc settings stay everywhere (services, networking,
kld_list, sendmail_*="NO"), plus allscreens_kbdflags="-b quiet.off"
(replaces the old kbdcontrol call). kern.panic_reboot_wait_time →
/etc/sysctl.conf if kept.
root_rw_mount="NO" is NOT baked — live-only, written by
init_script to the tmpfs upper (an installed UFS root needs the rw remount).
| Item | Decision | Mechanism |
|---|---|---|
| rc.d neutering (cleanvar/var/cleartmp/tmp/hostname) | DELETED | — |
LoginWindow.plist | everywhere | configure_system copies overlay → RELEASE_DIR/Local/… |
rc.d # REQUIRE: reordering (ldconfig/dbus/initgfx/slim) | DROPPED | replacing ldconfig's stock # REQUIRE: … cleanvar lets it run before the now-real cleanvar, which purges /var/run and wipes ld-elf.so.hints → every /usr/local/lib lib "not found". FreeBSD's default order is correct. |
initgfx __wait 3→1 | everywhere | configure_system build-time sed of RELEASE_DIR/usr/local/etc/rc.d/initgfx (non-ordering speedup) |
sysrc service / sendmail-off writes; allscreens_kbdflags | everywhere | configure_system heredoc |
kbdcontrol -b quiet.off | everywhere | folded into allscreens_kbdflags — no script |
kern.panic_reboot_wait_time=30 | everywhere, or drop | /etc/sysctl.conf baked into uzip |
| VirtualBox kmod load + sysrc | everywhere | /etc/rc.local (base; runs on live + installed; wanted on both) |
conscontrol mute on | everywhere | kern.consmute=1 in Tier-2 loader.conf.d/gershwin.conf |
| SMBIOS hostname override | DROPPED | both use hostname="gershwin" |
start-hello verbose logging sed | drop (dev aid) | — |
| monkey-patch | dropped (user-chosen) | — |
Other: fstab.extra / devfs.rules.extra
→ everywhere (baked into the uzip). local.lua boot menu → live-only (cd9660 only,
inherent). resources/config/Gershwin.conf → build-time pkg-repo config, not
live-system state.
Minimal pre-pivot script (modeled on freebsd-livecd-unionfs/ramdisk/init.sh). It
lives only on the cd9660 boot layer, so it is not copied to installed systems:
PATH=/rescue ; silence stdout/stderr
kldload geom_uzip ; kldload unionfs
mdconfig -a -t vnode -o readonly -f /boot/rootfs.uzip -u 0
wait-loop for /dev/md0.uzip (halt after ~30s)
mount -t ufs -o ro /dev/md0.uzip /sysroot
mount -t tmpfs tmpfs /upper
mount -t unionfs /upper /sysroot
mount -t devfs devfs /sysroot/dev
echo 'root_rw_mount="NO"' >> /sysroot/etc/rc.conf.local # live-only, tmpfs upper
kenv init_chroot=/sysroot ; kenv -u init_script init_shell ; exit 0
DELETEDropped vs. the old script: all per-directory
nullfs mounts, the /etc & /var tmpfs+cp dance, the rc.d
neutering, the whole "Gershwin specific" block (redistributed — see audit), the
ps -o command 1 single-user shell-out (single-user works natively now), and the
monkey-patch.
loader.conf reduced to the live-only set.
init_script="/boot/init_script" unchanged; add init_shell="/rescue/sh"
and unionfs_load="YES"; keep loader_conf_dirs="/boot/loader.conf.d"
(drop the loader.mute.d entry).resources/overlays/boot/loader.conf.d/gershwin.conf: quiet boot + hardware
quirks/tuning, including everything merged in from loader.mute.d.resources/overlays/boot/loader.mute.d/.prepare_boot_env() (~lines 325–395)tar -cf - boot | tar … -C CD_ROOT, COPYRIGHT
copy, rm -rf boot/modules/*, the .ko prune, kernel + module gzip,
cp …/login.conf CD_ROOT/etc/, the /rescue tar-pipe, fdupes,
the splash and tmpfs_load seds, rm tmpfs.ko*.mkdir -p block with
mkdir -p CD_ROOT/{sysroot,upper,dev,etc,sbin}.cp -R OVERLAYS_DIR/boot CD_ROOT still merges
loader.conf/lua and the rewritten init_script into
CD_ROOT/boot. Keep chmod 755 CD_ROOT/boot/init_script.cp RELEASE_DIR/boot/loader.conf.d/gershwin.conf CD_ROOT/boot/loader.conf.d/.ln -sf /rescue/init CD_ROOT/sbin/initln -sf /sysroot/boot/firmware CD_ROOT/boot/firmware
— kernel-context firmware loads resolve via the cd9660 namespace; Repo A ships
drm-kmod / gpu-firmware-kmod. Rock Ridge preserves both symlinks.generate_iso()No change. (The dd-era note about trimming the makefs -b 75% slack no
longer applies — bsdtar copies file content, not the UFS free space, so the slack
doesn't affect the installed-system size.)
configure_system() (~lines 226–283)Add to the existing sysrc heredoc: sendmail_*="NO" (×4),
allscreens_kbdflags="-b quiet.off",
kld_list+="cuse ig4 iicbus iichid utouch asmc if_urndis if_cdce if_ipheth".
root_rw_mount="NO" here — it would bake into the uzip and reach installs, where a
UFS root needs the rw remount. It's live-only, written by init_script into
the unionfs tmpfs upper instead.After dscli init:
mkdir -p RELEASE_DIR/Local/Library/Preferences +
copy in the existing resources/overlays/Local/Library/Preferences/LoginWindow.plist.mkdir -p RELEASE_DIR/nvidia.__wait 3→1 sed (a
non-ordering speedup). The old init_script's rc.d # REQUIRE: reordering
(ldconfig/dbus/initgfx/slim) is deliberately not ported — it was only safe when
cleanvar was a neutered stub; with the real cleanvar running, rewriting
ldconfig's stock # REQUIRE: … cleanvar lets ldconfig run before cleanvar purges
/var/run, wiping ld-elf.so.hints. FreeBSD's default rc order is correct.mkdir -p RELEASE_DIR/boot/loader.conf.d + write
the Tier-2 gershwin.conf into it.kern.panic_reboot_wait_time=30 to
RELEASE_DIR/etc/sysctl.conf (if kept).RELEASE_DIR/etc/rc.local (kenv product check → kldload vboxguest + sysrc enables);
create rc.local if absent.No gershwin_live rc.d script is created.
bsdtar of the live unionReplace the tree-copy logic (the rsync path and the broken cp -a
fallback) for the normal "install this live system" path:
gpart partitioning (EFI + freebsd-ufs, or
freebsd-boot + freebsd-ufs).newfs -U ${ROOT_PART}, then mount the target (+ EFI).bsdtar pipe straight from / — the
live in-kernel union (this is what captures the live session). No
--one-file-system (unionfs st_dev isn't trustworthy); every
mount point + live-only cruft named explicitly. tar is bsdtar on FreeBSD:( cd / && tar --acls --xattrs --fflags --exclude=./mnt --exclude=./dev --exclude=./proc --exclude=./tmp --exclude=./media --exclude=./compat/linux/proc --exclude=./compat/linux/sys --exclude=./compat/linux/dev --exclude=./var/run --exclude=./var/tmp --exclude=./var/cache --exclude=./etc/rc.conf.local -cf - . ) | ( cd $MNT && tar --acls --xattrs --fflags -xpf - )./mnt is the critical exclude — never read the target the extract side is writing.
Hardlinks (/rescue), ACLs, xattrs, BSD flags, ownership, perms,
timestamps, setuid/sticky all preserved by bsdtar.mnt dev proc tmp media compat/linux/{proc,sys,dev/shm} var/{run,tmp,cache});
chmod 1777 $MNT/tmp.chroot "$MNT" …/dscli init;
bootcode / EFI loader staging; the loader.conf append
(vfs.root.mountfrom="ufs:..." / nvme_load)./etc/fstab — on this path, prepend
the per-install root + EFI entries to the copied fstab (which carries the
fstab.extra mounts) rather than overwriting.Detect /dev/md0.uzip; if absent (installer run outside a live boot), error clearly.
The separate image-based install path (copying from a mounted da0) keeps
rsync with a hardlink-safe tar-pipe fallback — the cp -a
fallback is removed either way.
Copier.framework + CLI as a fully-native, cross-platform file-level copier, scoped
in its own pkgdemon.github.io plan doc. The current bsdtar step is the interim;
both are file-level walkers, so the swap is clean.dshelper.rc's dshelper_start() checks [ -f pidfile ]
rather than process liveness — a latent weakness in the DirectoryServices component
(gershwin-developer), not these two repos. With the architecture + bsdtar installer
the stale pidfile is never present on an install, so it won't be triggered. Flag it for a
separate follow-up.build.sh on a FreeBSD 15 host produces the ISO. Inspect
the cdroot: boot/init_script rewritten, sbin/init +
boot/firmware are symlinks, sysroot/upper present,
boot/loader.conf.d/gershwin.conf present, boot/rootfs.uzip present.admin
logs in. mount shows the unionfs overlay;
ls -la /etc/rc.d/cleanvar is the real FreeBSD script;
sysrc -n root_rw_mount is NO; quiet boot active; in VirtualBox
kldstat | grep vbox.dmesg | grep -i firmware
shows drm / iwlwifi firmware loaded.pkg install
a marker package (e.g. nano) so there's a known live-session change to
verify. Then clone gershwin-system @ unionfs-full-readwrite, run its
installer.sh against a target disk; reboot the installed system. Verify:
the marker package (nano) is installed — the live session was
captured; /etc/rc.d/cleanvar is the real script; /var/run has no stale
dshelper.pid; service dshelper status running; admin
logs in; sysrc -n root_rw_mount is empty/YES and root is rw;
loader.conf has the installer's ufs: mountfrom and no
unionfs_load/init_script; loader.conf.d/gershwin.conf present./rescue
is still hardlinked (stat -f %l /rescue/sh shows a high link count, not ~200
copies) and setuid bits are intact (ls -l /usr/bin/su → -r-sr-xr-x);
/etc/rc.conf.local is absent; /var/run has no stale
pidfiles. Confirm quiet boot on the installed system (kern.consmute set)
and hostname = gershwin on both — no is_live()-gated
behavior should exist.| Repo | Path | Action |
|---|---|---|
| gershwin-on-freebsd | build.sh — prepare_boot_env(), configure_system() | CHANGE |
| gershwin-on-freebsd | resources/overlays/boot/init_script | CHANGE rewrite (name/location unchanged) |
| gershwin-on-freebsd | resources/overlays/boot/loader.conf | CHANGE reduce to Tier 1 |
| gershwin-on-freebsd | resources/overlays/boot/loader.conf.d/gershwin.conf | NEW Tier 2 |
| gershwin-on-freebsd | resources/overlays/boot/loader.mute.d/ | DELETE merged into Tier 2 |
| gershwin-system | Library/Scripts/installer.sh | CHANGE bsdtar-based install |
| reference | freebsd-livecd-unionfs/ramdisk/init.sh, .../build.sh | read-only |