NextBSD unionfs inode-collision fix — design plan Design / proposal

Why distinct headers silently vanish during builds on the NextBSD live root (unknown type name 'GS_EXPORT', 'cups_lang_t', 'OBJC_PUBLIC', encode_NSInteger), why it's a unionfs (device,inode) collision and not a code bug, and a permanent kernel-side fix modelled on Linux overlayfs's xino. Companion to issue #332.

TL;DR

1. The bug, as observed

During make install of the GNUstep stack on the live root, clang fails on different files each layer of the build, always with the same shape — a type or macro that is defined in an included header is reported undefined:

Build unitErrorDefining header that got skipped
libs-back gpbs.munknown type name 'GS_EXPORT'GNUstepBase/GSVersionMacros.h
libs-gui GSCUPSPrintOperation.munknown type name 'cups_lang_t'/usr/local/include/cups/language.h
libobjc2 arc.mmunknown type name 'OBJC_PUBLIC'objc/objc-visibility.h
libs-gui NSActionCell.mcall to undeclared function 'encode_NSInteger'GSGuiPrivate.h

Two tells point away from "broken code": (1) the #include/#import line itself raises no error — the failure surfaces far downstream at first use of the missing symbol; and (2) the identical sources build cleanly on stock FreeBSD. So the variable is the environment — specifically the filesystem — not the code.

1.1 Evidence from the box

$ mount | grep -iE 'union|cow'
<above>:/cow on / (unionfs, local)
tmpfs on /cow (tmpfs, local)

$ stat -f '%d  %N' /System/Library/Headers /usr/include /usr/local/include
1090584323  /System/Library/Headers
1090584323  /usr/include
1090584323  /usr/local/include          # one device id across all three trees

$ stat -f '%d:%i  %N' /usr/local/include/cups/language.h
1090584323:25678  /usr/local/include/cups/language.h   # low inode number → collision-prone

The live root is a unionfs: a writable tmpfs upper layer (/cow) over a read-only uzip/cd9660 lower layer, all presented under a single device id (1090584323). Inode numbers in each layer are allocated independently from low values, so collisions across layers are not just possible — they're likely for the densely-numbered early inodes that system headers occupy.

2. Root cause

Three facts combine into the bug.

2.1 clang identifies files by (device, inode), not path

clang's FileManager uniques files by inode so that two names for one file (symlinks, hardlinks) are treated as a single file. The identity token is llvm::sys::fs::UniqueID — a pair (Device, File) populated on POSIX directly from st_dev and st_ino. Distinct paths that stat to the same pair collapse onto one FileEntry:

// clang/lib/Basic/FileManager.cpp
// "See if we have already opened a file with the same inode.
//  This occurs when one dir is symlinked to another, for example."
FileEntry *&UFE = UniqueRealFiles[Status.getUniqueID()];

All three re-inclusion-skipping mechanisms — the multiple-include optimization (include guards), #pragma once, and Objective-C #import — record their state in a HeaderFileInfo keyed on that FileEntry identity, and consult it in HeaderSearch::ShouldEnterIncludeFile(). None of them is path-based.

2.2 unionfs collapses st_dev and passes inode numbers through

In FreeBSD/NextBSD unionfs (files sys/fs/unionfs/union_subr.c, union_vnops.c):

Result: two genuinely different files — one upper, one lower — can carry the same (st_dev, st_ino).

2.3 The collision → silent skip

#include <cups/cups.h> └─ #include "language.h" ← stats to (1090584323 : 25678) but (…:25678) was ALREADY seen earlier in the TU (a different header on the other layer) → FileManager returns the SAME FileEntry → ShouldEnterIncludeFile() sees "already included / #pragma once" → returns Skip — NO lexer pushed, NO diagnostic … cups_lang_t *lang; ← error: unknown type name 'cups_lang_t' (300 lines later)

From clang's point of view this is the normal, desired behaviour that makes #pragma once work — it has no way to know the OS handed it two different files under one identity. That's why there is no warning at the include site.

3. Why there is no toolchain-side fix

This exact scenario (a union/FUSE overlay presenting colliding inodes to clang) was raised upstream and adjudicated: clang "determines if two different names refer to the same file based on the inode returned by stat()," and the resolution was that this is a filesystem bug, not a clang bug (the real-world fix there was the FUSE -o use_ino mount option). There is no -fno-… switch that reroutes #pragma once / #import / the multiple-include optimization off UniqueID and onto path strings — the identity key is structural in FileManager/HeaderSearch/Preprocessor, not policy-gated.

Conclusion: the only correct, durable fix is for unionfs to present a unique (st_dev, st_ino) per distinct file. Everything else (the inode-bust sweep, avoiding the overlay) is mitigation.

4. Prior art — Linux overlayfs xino

overlayfs hit this identical bug (it broke tar's hardlink detector, du -x, yum, and the same cpp include-once class) and fixed it in Linux 4.17 with the xino feature. The design is the template for our fix:

NextBSD's unionfs is simpler than overlayfs — exactly two layers (one upper, one lower), not an arbitrary stack — so a single tag bit suffices and the bit-budget arithmetic collapses to "set bit 63 for upper."

5. The fix — layer-tagged fileid in unionfs

Approach: keep unionfs's single st_dev, and in unionfs_getattr() OR the top bit of the 64-bit va_fileid for upper-layer files (clear it for lower). Mirror the identical remap onto d_fileno in unionfs_readdir(). Two layers ⇒ two disjoint inode sub-spaces ⇒ (st_dev, st_ino) is unique by construction. clang's identity key becomes unique again and headers stop vanishing.

5.1 Source layout & touch points

NextBSD builds against the FreeBSD tree (fork nextbsd-redux/freebsd-src) plus a thin patch series in nextbsd-kernel/patches/; no unionfs inode logic is currently patched out-of-tree. Note the FreeBSD file names are union.h / union_subr.c / union_vnops.c (functions are unionfs_*):

LocationTodayChange
union.hstruct unionfs_nodetracks un_uppervp / un_lowervp; layer predicate is un_uppervp != NULLVPadd the tag macro + helper
union_vnops.cunionfs_getattr()copies underlying attrs, rewrites only va_fsid; va_fileid passes throughOR/clear bit 63 of va_fileid per layer
union_vnops.cunionfs_readdir()delegates to underlying VOP_READDIR straight into the user uio; d_fileno rawbounce-buffer, tag each d_fileno to match getattr

5.2 The helper

/* union.h */
#define UNIONFS_FILEID_UPPER_BIT  (1ULL << 63)

static __inline uint64_t
unionfs_remap_fileid(bool upper, uint64_t id)
{
    return (upper ? (id | UNIONFS_FILEID_UPPER_BIT)
                  : (id & ~UNIONFS_FILEID_UPPER_BIT));
}
/* upper = (unp->un_uppervp != NULLVP) — the existing layer predicate */

5.3 unionfs_getattr()

Adjacent to the two existing va_fsid assignments (upper and lower return paths), after the underlying VOP_GETATTR succeeds:

/* upper path */
ap->a_vap->va_fsid   = ap->a_vp->v_mount->mnt_stat.f_fsid.val[0];
ap->a_vap->va_fileid |= UNIONFS_FILEID_UPPER_BIT;        /* NEW */

/* lower path */
ap->a_vap->va_fsid    = ap->a_vp->v_mount->mnt_stat.f_fsid.val[0];
ap->a_vap->va_fileid &= ~UNIONFS_FILEID_UPPER_BIT;       /* NEW (defensive) */

5.4 unionfs_readdir() — the hard half

Because VOP_READDIR writes struct dirent records (with d_fileno) directly into the caller's uio, the tag has to be applied after the underlying readdir but before the data reaches userspace. Route the three call sites through a helper that reads into a kernel bounce buffer, walks the dirents, tags each d_fileno, then uiomoves the rewritten block:

for (struct dirent *dp = kbuf; (char *)dp < kbuf + nbytes;
     dp = (struct dirent *)((char *)dp + dp->d_reclen)) {
    if (dp->d_reclen == 0) break;
    dp->d_fileno = unionfs_remap_fileid(upper, dp->d_fileno);
}
/* then uiomove(kbuf, nbytes, uio); preserve d_off / NFS cookie merge */

The upper flag is per source directory, not per entry: in the merged case the upper pass tags upper=true, the lower pass upper=false — correct, because each entry's fileid belongs to the layer whose directory produced it. This must agree with getattr or stat(2) and readdir(3) disagree, which breaks find, fts(3), and tar.

6. Alternatives considered

OptionIdeaVerdict
A. Layer-tagged fileid (this plan) Single st_dev, high-bit layer tag on va_fileid + d_fileno. Chosen overlayfs-proven; collision-free by construction for 2 layers; keeps one-device semantics that find -xdev/du -x/tar expect.
B. Per-layer real st_dev Report each underlying layer's true st_dev instead of one mount fsid; different devices ⇒ no (dev,ino) collision. Rejected Less code, but readdir carries no device so d_fileno stays ambiguous, and a single tree now spans multiple device ids — breaks find -xdev / du -x / tar --one-file-system. overlayfs explicitly avoided this.
C. Separate filesystem for the build root Mount /Developer, /usr/local, the headers on a dedicated real FS (own device id), or build on a disk install rather than the live union. Good interim / no-kernel Sidesteps unionfs entirely (distinct device ⇒ no collision). Doesn't fix the union root itself; best as the stopgap while A lands.
D. Inode-bust sweep Rewrite headers in place (cp+mv) to force fresh, higher inode numbers before each build. Band-aid only What we ran to unblock GNUstep. Non-deterministic and recurs on every copy-up / new port. Scaffolding, not a fix.

7. Risks & implications of the fix

The change is small in lines but touches core VFS paths (getattr, readdir) and changes a user-visible value (st_ino). The risks, with mitigations:

R1 — Larger 64-bit inode numbers → EOVERFLOW for 32-bit ino_t consumers. Setting bit 63 produces very large st_ino values for every upper-layer file. Any caller using a 32-bit ino_t (a compat32/freebsd11 ABI stat shim, an old binary, code built without 64-bit ino_t) will get EOVERFLOW rather than a truncated number.
Mitigation: FreeBSD has had 64-bit ino_t since the ino64 work (FreeBSD 12), so modern NextBSD userland is fine. Audit the freebsd11_stat/compat32 translation paths and decide deliberately to fail with EOVERFLOW (correct) rather than truncate (silently reintroduces collisions). Confirm the toolchain — clang, make, install — is 64-bit-ino_t-clean, since it is the consumer this whole effort exists to satisfy.
R2 — Per-readdir bounce-buffer cost. Tagging d_fileno means copying every directory read through a kernel buffer and walking the dirents, instead of letting the underlying FS write straight to the user uio. That's an extra allocation + copy + scan per readdir call.
Mitigation: directory reads are already syscall-bounded and chunked; size the bounce buffer to the existing read granularity and reuse it across the merged upper/lower passes. The cost is proportional to entries returned, not to tree size. Build workloads do a lot of readdir, so benchmark a find /usr and a full GNUstep rebuild before/after; expect low-single-digit percentage, not a step change.
R3 — st_ino changes across copy-up. When a lower file is modified, unionfs creates an upper shadow; afterward the file reports the upper tmpfs inode (now with bit 63 set) instead of the lower cd9660 inode. The inode number changes mid-life. This is already true today (raw tmpfs ino ≠ raw cd9660 ino); the tag doesn't worsen it, but it remains a property.
Mitigation: acceptable for a live/build root. Unlike overlayfs we are not proposing an origin-xattr identity-stability layer (overkill for an ephemeral root). Document it; if a future consumer needs stable identity across writes, that's a separate index-style feature.
R4 — Bit-63 exhaustion on large-inode lower filesystems. The OR-tag is collision-free only if the underlying FS never legitimately uses bit 63. tmpfs, cd9660/uzip, and UFS (32-bit inodes) never do. ZFS does not bound object IDs to 63 bits, so a ZFS-backed layer could in principle carry a fileid with bit 63 set, and OR-ing would alias it onto an upper file.
Mitigation: NextBSD's live lower layer is uzip/cd9660 and the upper is tmpfs — both safe, so ship the simple tag now. Add a mount-time assertion/guard: if either backing FS can produce inode numbers ≥ 263, refuse the tag mode (or switch to an overlayfs-style per-inode fallback). Don't silently OR on a ZFS layer.
R5 — stat/readdir divergence if the two remaps drift. If a code path tags va_fileid but a readdir path is missed (or vice-versa), stat(2) and d_fileno disagree for the same file — which breaks find, fts(3), tar, and NFS readdirplus in new, subtler ways than the original bug.
Mitigation: a single shared unionfs_remap_fileid() helper used by both paths (never inline the bit-twiddle twice), plus a regression test asserting stat(dir/entry).st_ino == the entry's d_fileno for every entry, including . and .. across shadowed/lower-only parents.
R6 — NFS export of the union root. If the union root is NFS-exported, the tagged getattr fileid must also flow through VOP_VPTOFH/VFS_FHTOVP, or the file-handle-derived inode won't match the stat inode. unionfs's NFS-export path is already fragile (the readdir cookie-merge).
Mitigation: the live build root is not NFS-exported, so this is out of scope — but document that the union root is not NFS-exportable with this patch, and audit the vptofh/fhtovp path before anyone tries.
R7 — One-time inode-number churn. Every upper-layer file's st_ino changes the day this lands. Anything that recorded inode numbers before (a ccache/precompiled-header cache keyed on identity, a backup tool's hardlink index, an mtree manifest) sees a one-time discontinuity.
Mitigation: low impact on an ephemeral live root; flush build/identity caches once after deploying the patched kernel. Note it in the changelog.
R8 — Hardlinks must stay layer-deterministic. Two hardlinks to one upper file must still map to one inode. They do, because the tag is layer-constant and the underlying fileid is shared — but the invariant must be preserved (don't make the tag depend on path or dirent). Cross-layer copy-up breaking a hardlink is a pre-existing unionfs property, unchanged here.
R9 — Maintenance / upstreamability. This is carried as an out-of-tree kernel patch (patches/0007-unionfs-layer-tagged-fileid.patch), so it must be re-validated on every FreeBSD rebase and risks divergence.
Mitigation: the change is self-contained and overlayfs-justified; propose it upstream to FreeBSD (a mount_unionfs -o xino-style option) so it eventually lands in-tree and stops being a carry.
Net: no risk here is a blocker. R1 (64-bit ino_t) and R4 (ZFS bit-63) are the two to verify up front; both are satisfied by the current live-root composition (64-bit userland, tmpfs+cd9660 layers). The rest are documentation, a shared helper, and a benchmark.

8. Testing & validation

8.1 Collision detector (run before & after)

FreeBSD stat -f: %d = st_dev, %i = st_ino. Distinct paths sharing a dev:ino are exactly what clang mis-dedups:

#!/bin/sh
# collide.sh — report DISTINCT regular-file paths sharing (st_dev, st_ino).
# Usage: ./collide.sh /usr/include /usr/local/include /System/Library/Headers /Developer
set -eu
tmp=$(mktemp /tmp/collide.XXXXXX); trap 'rm -f "$tmp"' EXIT
find "$@" -type f -print0 | xargs -0 stat -f '%d:%i	%N' | sort > "$tmp"
awk -F'\t' '{ if ($1==pk && $2!=pp){ if(!s[$1]++) print "COLLISION "$1"\n  "pp; print "  "$2 }
              pk=$1; pp=$2 }' "$tmp"
awk -F'\t' 'k[$1]++{c=1} END{exit !c}' "$tmp" && { echo "FAIL: dup (dev,ino)"; exit 1; }
echo "OK: every path has a unique (st_dev, st_ino)"

(True hardlinks legitimately share a pair; on the union root you expect zero shared keys between distinct logical files, so any hit is suspect — optionally cmp flagged pairs and only fail when contents differ.)

8.2 Kernel regression test (deliberate overlap)

Construct the pathological condition on purpose with two fresh backing FSes whose first files share a low inode number, union-mount, then assert every presented file has a distinct identity:

mdconfig -a -t swap -s 16m -u 0; newfs -U /dev/md0; mount /dev/md0 /mnt/lower
mdconfig -a -t swap -s 16m -u 1; newfs -U /dev/md1; mount /dev/md1 /mnt/upper
: > /mnt/lower/a.h        # first file on md0 → ino X
: > /mnt/upper/b.h        # first file on md1 → same ino X, different backing dev
mount_unionfs /mnt/upper /mnt/lower
n=$(find /mnt/lower -type f | wc -l)
u=$(find /mnt/lower -type f -print0 | xargs -0 stat -f '%d:%i' | sort -u | wc -l)
[ "$n" -eq "$u" ] || { echo "FAIL: union presents colliding (dev,ino)"; exit 1; }

A correct fix makes n == u. Wire it in as a kyua/ATF test so it runs on every kernel build. Add a companion test asserting statd_fileno agreement (R5), including ./...

8.3 End-to-end

Clean-rebuild the real workload in dependency order on the union root and grep the logs for the collision signatures:

for p in libs-base libs-gui libs-back; do ( cd /Developer/Library/Sources/$p && gmake clean && gmake 2>&1 | tee /tmp/$p.log ); done
! grep -E 'unknown type name|undeclared|undefined macro|implicit declaration' /tmp/libs-*.log
./collide.sh /usr/include /usr/local/include /System/Library/Headers /Developer

Pass = all three link, zero collision-signature errors, and the detector reports a clean sweep.

9. Implementation checklist & rollout

  1. Add UNIONFS_FILEID_UPPER_BIT + unionfs_remap_fileid() to union.h.
  2. Tag va_fileid in both unionfs_getattr() return paths.
  3. Bounce-buffer + tag d_fileno in unionfs_readdir()'s three call sites; preserve d_off/cookie merge.
  4. Mount-time guard (R4): refuse/῾fallback if a backing FS can emit inode numbers ≥ 263.
  5. kyua/ATF tests: deliberate-overlap uniqueness, statreaddir agreement, copy-up inode change.
  6. Carry as nextbsd-kernel/patches/0007-unionfs-layer-tagged-fileid.patch + series entry; gate behind mount_unionfs -o xino if you want it opt-in initially.
  7. Benchmark readdir-heavy paths (R2) before/after.
  8. Retire the inode-bust sweep from the build scripts once CI is green on the patched kernel.
  9. Propose upstream to FreeBSD (R9).

10. Open questions

Q1. Opt-in or always-on? Ship as a mount_unionfs -o xino option (safe, opt-in, matches overlayfs) or make it the default for the live root? Leaning: default-on for the live root, option-gated in-tree.
Q2. Guard vs. fallback for large-inode layers (R4). For now the live layers are safe; do we add the full overlayfs-style per-inode fallback, or just assert and refuse the mode on unsafe backing FSes? Leaning: assert+refuse now, fallback only if a ZFS-backed layer becomes real.
Q3. Do we ever need copy-up identity stability (R3)? Only if a build tool caches (dev,ino) across writes and breaks. No evidence yet; defer the origin-xattr equivalent unless one shows up.

Appendix A — inodes, df, and the "2.1G ifree" question

Background for reading the live root's storage state, and a units gotcha worth pinning down because it looks alarming and isn't.

A.1 What an inode is

An inode ("index node") is the structure that stores everything about one file except its name: type, permissions, owner, timestamps, size, link count, and pointers to the data blocks. The name lives separately, in the directory entry that points at the inode. Every file, directory, and symlink consumes exactly one inode — and the (st_dev, st_ino) pair from that inode is the very identity clang keys include-once on (§2.1), which is what this whole plan is about.

A filesystem therefore has two independent capacity limits, and either can be exhausted alone:

A million tiny files can use every inode while gigabytes of bytes sit free; one huge file can fill the bytes with inodes to spare.

A.2 The inode columns — and the units gotcha

ColumnMeaning
iusedinodes in use (files that exist)
ifreeinodes still available
%iusedinodes used, as a percentage
The gotcha: in the inode columns, G means giga as a count (≈ 2.1 billion inodes), not 2.1 gigabytes of memory. ifree 2.1G is room for ~2.1 billion files; it has nothing to do with how much RAM the machine has. So "only 2.1G free inodes on an 8 GB box" is a category error — it's a file count, not a byte size.

A.3 Sample from the live root

$ df -h
Filesystem            Size   Used  Avail Capacity iused  ifree %iused  Mounted on
<above>:/cow           13G   6.9G   6.1G    53%    143k   2.1G    0%   /
/dev/md0             7.1M   3.5M   3.0M    54%      31    991    3%   /
/dev/md1.uzip        6.3G   5.6G   173M    98%    115k   9.7k   92%   /rofs
tmpfs                6.7G   606M   6.1G     9%     18k   2.1G    0%   /cow

A.4 Why ~2.1G, and why it isn't the RAM number

2.1G is right around 231 (≈ 2,147,483,648) — the signed-32-bit max. That's tmpfs's default way of saying "don't meaningfully cap by inode count." It is a default ceiling, not anything derived from physical memory.

The limit that is memory-derived is the byte size: tmpfs is RAM-backed, so FreeBSD defaults its Size to roughly available memory (≈ 6.7 G of an 8 GB box, the rest reserved for the kernel). The 6.1 G Avail on /cow is the real build budget, and it's the ceiling that actually binds — every file also costs memory, so you'd exhaust those 6.1 GB (file data + inode structures) long before creating 2 billion files. Inode count: effectively infinite. Byte size: the one to watch.

A.5 Reading the rest of the rows

MountReading
tmpfs → /cowWritable upper layer. Inode cap ≈ 2.1 G (≈ unlimited), byte cap 6.7 G (RAM-bound). All build output lands here.
<above>:/cow → /The unionfs. iused 143k ≈ lower (115k) + upper (18k); ifree 2.1G comes from the writable tmpfs — the only layer you can create files on.
/dev/md1.uzip → /rofsRead-only compressed lower layer. %iused 92% looks scary but is fine — a packed read-only image with exactly the files it needs; it never grows.
/dev/md0 → /Tiny mfsroot boot scaffold — a traditional FS with inodes pre-allocated at newfs time (1022 total, 991 free), unlike tmpfs's dynamic allocation.
iso9660, devfsReport 0 inodes — they don't expose a POSIX inode count to df.

Bottom line: 0% inodes used and 6.1 G of byte-space free — the live root is neither inode- nor space-constrained for builds.

A.6 Tie-back to the collision bug

The huge ifree count and the inode numbering are unrelated. tmpfs hands out inode numbers sequentially from low values as files are created (hence language.h = 25678, GSVersionMacros.h = 14323), which is exactly why they collide with the low-numbered system headers on the lower layer (§2.2). Having billions of free inodes does nothing to spread the numbers apart — only the high-bit layer tag in §5 does that.

11. References