NextBSD unionfs inode-collision fix

Why distinct headers silently vanish during builds on the NextBSD live root (unknown type name 'GS_EXPORT', 'cups_lang_t', 'OBJC_PUBLIC', encode_NSInteger), why it's a unionfs (device,inode) collision and not a code bug, and a permanent kernel-side fix modelled on Linux overlayfs's xino. Companion to issue #332.

TL;DR

Symptom: on the live unionfs root, clang randomly reports a type/macro as undefined (GS_EXPORT, cups_lang_t, OBJC_PUBLIC, encode_NSInteger) even though the defining header is #included and the identical source builds fine on stock FreeBSD.
Cause: clang's include-once logic (#pragma once, #import, include-guard optimization) identifies files by UniqueID = (st_dev, st_ino), never by path. NextBSD's unionfs gives every layer the same st_dev and passes each underlying layer's inode number through unchanged, so an upper-layer header and an unrelated lower-layer header can land on the same (st_dev, st_ino). clang then treats the second as a re-include of the first and silently skips it — no diagnostic at the #include line. Confirmed on thinkpad-t460s: /usr/include, /usr/local/include, and /System/Library/Headers all report device 1090584323.
No toolchain escape hatch. There is no clang flag to make include-once path-based; upstream LLVM has adjudicated this exact scenario as a filesystem bug. The fix must live in the kernel's unionfs vnode layer.
The fix: make unionfs present a globally-unique (st_dev, st_ino) for every distinct file, modelled on Linux overlayfs's xino feature. Keep one st_dev for the mount, but reserve the high bit of the 64-bit fileid as a layer tag (upper vs lower) in unionfs_getattr(), and mirror the identical remap onto d_fileno in unionfs_readdir() so stat(2) and readdir(3) agree. NextBSD's union is exactly two layers, so one tag bit is provably collision-free.
Interim (already in use): an inode-bust sweep (rewrite headers in place to force fresh inodes) across /System/Library/Headers, /usr/include, /usr/local/include, /Developer. It unblocked the GNUstep build, but it is whack-a-mole — every copy-up or new port re-collides.
Risks are real but bounded: larger 64-bit inode numbers (EOVERFLOW for any 32-bit-ino_t consumer), a per-readdir bounce-buffer cost, copy-up identity churn, NFS-export caveats, and a one-time inode-number change for anything caching them. All are manageable; see §8.

1. The bug, as observed

During make install of the GNUstep stack on the live root, clang fails on different files each layer of the build, always with the same shape — a type or macro that is defined in an included header is reported undefined:

Build unit	Error	Defining header that got skipped
libs-back `gpbs.m`	`unknown type name 'GS_EXPORT'`	`GNUstepBase/GSVersionMacros.h`
libs-gui `GSCUPSPrintOperation.m`	`unknown type name 'cups_lang_t'`	`/usr/local/include/cups/language.h`
libobjc2 `arc.mm`	`unknown type name 'OBJC_PUBLIC'`	`objc/objc-visibility.h`
libs-gui `NSActionCell.m`	`call to undeclared function 'encode_NSInteger'`	`GSGuiPrivate.h`

Two tells point away from "broken code": (1) the #include/#import line itself raises no error — the failure surfaces far downstream at first use of the missing symbol; and (2) the identical sources build cleanly on stock FreeBSD. So the variable is the environment — specifically the filesystem — not the code.

1.1 Evidence from the box

The live root is a unionfs: a writable tmpfs upper layer (/cow) over a read-only uzip/cd9660 lower layer, all presented under a single device id (1090584323). Inode numbers in each layer are allocated independently from low values, so collisions across layers are not just possible — they're likely for the densely-numbered early inodes that system headers occupy.

2. Root cause

2.1 clang identifies files by (device, inode), not path

clang's FileManager uniques files by inode so that two names for one file (symlinks, hardlinks) are treated as a single file. The identity token is llvm::sys::fs::UniqueID — a pair (Device, File) populated on POSIX directly from st_dev and st_ino. Distinct paths that stat to the same pair collapse onto one FileEntry:

All three re-inclusion-skipping mechanisms — the multiple-include optimization (include guards), #pragma once, and Objective-C #import — record their state in a HeaderFileInfo keyed on that FileEntry identity, and consult it in HeaderSearch::ShouldEnterIncludeFile(). None of them is path-based.

2.2 unionfs collapses st_dev and passes inode numbers through

In FreeBSD/NextBSD unionfs (files sys/fs/unionfs/union_subr.c, union_vnops.c):

Result: two genuinely different files — one upper, one lower — can carry the same (st_dev, st_ino).

2.3 The collision → silent skip

From clang's point of view this is the normal, desired behaviour that makes #pragma once work — it has no way to know the OS handed it two different files under one identity. That's why there is no warning at the include site.

3. Why there is no toolchain-side fix

This exact scenario (a union/FUSE overlay presenting colliding inodes to clang) was raised upstream and adjudicated: clang "determines if two different names refer to the same file based on the inode returned by stat()," and the resolution was that this is a filesystem bug, not a clang bug (the real-world fix there was the FUSE -o use_ino mount option). There is no -fno-… switch that reroutes #pragma once / #import / the multiple-include optimization off UniqueID and onto path strings — the identity key is structural in FileManager/HeaderSearch/Preprocessor, not policy-gated.

4. Prior art — Linux overlayfs xino

overlayfs hit this identical bug (it broke tar's hardlink detector, du -x, yum, and the same cpp include-once class) and fixed it in Linux 4.17 with the xino feature. The design is the template for our fix:

NextBSD's unionfs is simpler than overlayfs — exactly two layers (one upper, one lower), not an arbitrary stack — so a single tag bit suffices and the bit-budget arithmetic collapses to "set bit 63 for upper."

5. The fix — layer-tagged fileid in unionfs

5.1 Source layout & touch points

NextBSD builds against the FreeBSD tree (fork nextbsd-redux/freebsd-src) plus a thin patch series in nextbsd-kernel/patches/; no unionfs inode logic is currently patched out-of-tree. Note the FreeBSD file names are union.h / union_subr.c / union_vnops.c (functions are unionfs_*):

5.2 The helper

Location	Today	Change
`union.h` — `struct unionfs_node`	tracks `un_uppervp` / `un_lowervp`; layer predicate is `un_uppervp != NULLVP`	add the tag macro + helper
`union_vnops.c` — `unionfs_getattr()`	copies underlying attrs, rewrites only `va_fsid`; `va_fileid` passes through	OR/clear bit 63 of `va_fileid` per layer
`union_vnops.c` — `unionfs_readdir()`	delegates to underlying `VOP_READDIR` straight into the user `uio`; `d_fileno` raw	bounce-buffer, tag each `d_fileno` to match getattr

5.3 unionfs_getattr()

Adjacent to the two existing va_fsid assignments (upper and lower return paths), after the underlying VOP_GETATTR succeeds:

5.4 unionfs_readdir() — the hard half

Because VOP_READDIR writes struct dirent records (with d_fileno) directly into the caller's uio, the tag has to be applied after the underlying readdir but before the data reaches userspace. Route the three call sites through a helper that reads into a kernel bounce buffer, walks the dirents, tags each d_fileno, then uiomoves the rewritten block:

The upper flag is per source directory, not per entry: in the merged case the upper pass tags upper=true, the lower pass upper=false — correct, because each entry's fileid belongs to the layer whose directory produced it. This must agree with getattr or stat(2) and readdir(3) disagree, which breaks find, fts(3), and tar.

6. Alternatives considered

7. Risks & implications of the fix

Option	Idea	Verdict
A. Layer-tagged fileid (this plan)	Single `st_dev`, high-bit layer tag on `va_fileid` + `d_fileno`.	Chosen overlayfs-proven; collision-free by construction for 2 layers; keeps one-device semantics that `find -xdev`/`du -x`/`tar` expect.
B. Per-layer real `st_dev`	Report each underlying layer's true `st_dev` instead of one mount fsid; different devices ⇒ no `(dev,ino)` collision.	Rejected Less code, but `readdir` carries no device so `d_fileno` stays ambiguous, and a single tree now spans multiple device ids — breaks `find -xdev` / `du -x` / `tar --one-file-system`. overlayfs explicitly avoided this.
C. Separate filesystem for the build root	Mount `/Developer`, `/usr/local`, the headers on a dedicated real FS (own device id), or build on a disk install rather than the live union.	Good interim / no-kernel Sidesteps unionfs entirely (distinct device ⇒ no collision). Doesn't fix the union root itself; best as the stopgap while A lands.
D. Inode-bust sweep	Rewrite headers in place (`cp`+`mv`) to force fresh, higher inode numbers before each build.	Band-aid only What we ran to unblock GNUstep. Non-deterministic and recurs on every copy-up / new port. Scaffolding, not a fix.

The change is small in lines but touches core VFS paths (getattr, readdir) and changes a user-visible value (st_ino). The risks, with mitigations:

R1 — Larger 64-bit inode numbers → EOVERFLOW for 32-bit ino_t consumers. Setting bit 63 produces very large st_ino values for every upper-layer file. Any caller using a 32-bit ino_t (a compat32/freebsd11 ABI stat shim, an old binary, code built without 64-bit ino_t) will get EOVERFLOW rather than a truncated number.
Mitigation: FreeBSD has had 64-bit ino_t since the ino64 work (FreeBSD 12), so modern NextBSD userland is fine. Audit the freebsd11_stat/compat32 translation paths and decide deliberately to fail with EOVERFLOW (correct) rather than truncate (silently reintroduces collisions). Confirm the toolchain — clang, make, install — is 64-bit-ino_t-clean, since it is the consumer this whole effort exists to satisfy.

R2 — Per-readdir bounce-buffer cost. Tagging d_fileno means copying every directory read through a kernel buffer and walking the dirents, instead of letting the underlying FS write straight to the user uio. That's an extra allocation + copy + scan per readdir call.
Mitigation: directory reads are already syscall-bounded and chunked; size the bounce buffer to the existing read granularity and reuse it across the merged upper/lower passes. The cost is proportional to entries returned, not to tree size. Build workloads do a lot of readdir, so benchmark a find /usr and a full GNUstep rebuild before/after; expect low-single-digit percentage, not a step change.

R4 — Bit-63 exhaustion on large-inode lower filesystems. The OR-tag is collision-free only if the underlying FS never legitimately uses bit 63. tmpfs, cd9660/uzip, and UFS (32-bit inodes) never do. ZFS does not bound object IDs to 63 bits, so a ZFS-backed layer could in principle carry a fileid with bit 63 set, and OR-ing would alias it onto an upper file.
Mitigation: NextBSD's live lower layer is uzip/cd9660 and the upper is tmpfs — both safe, so ship the simple tag now. Add a mount-time assertion/guard: if either backing FS can produce inode numbers ≥ 2⁶³, refuse the tag mode (or switch to an overlayfs-style per-inode fallback). Don't silently OR on a ZFS layer.

R5 — stat/readdir divergence if the two remaps drift. If a code path tags va_fileid but a readdir path is missed (or vice-versa), stat(2) and d_fileno disagree for the same file — which breaks find, fts(3), tar, and NFS readdirplus in new, subtler ways than the original bug.
Mitigation: a single shared unionfs_remap_fileid() helper used by both paths (never inline the bit-twiddle twice), plus a regression test asserting stat(dir/entry).st_ino == the entry's d_fileno for every entry, including . and .. across shadowed/lower-only parents.

8. Testing & validation

8.1 Collision detector (run before & after)

FreeBSD stat -f: %d = st_dev, %i = st_ino. Distinct paths sharing a dev:ino are exactly what clang mis-dedups:

(True hardlinks legitimately share a pair; on the union root you expect zero shared keys between distinct logical files, so any hit is suspect — optionally cmp flagged pairs and only fail when contents differ.)

8.2 Kernel regression test (deliberate overlap)

Construct the pathological condition on purpose with two fresh backing FSes whose first files share a low inode number, union-mount, then assert every presented file has a distinct identity:

A correct fix makes n == u. Wire it in as a kyua/ATF test so it runs on every kernel build. Add a companion test asserting stat ↔ d_fileno agreement (R5), including ./...

8.3 End-to-end

Clean-rebuild the real workload in dependency order on the union root and grep the logs for the collision signatures:

Pass = all three link, zero collision-signature errors, and the detector reports a clean sweep.

9. Implementation checklist & rollout

10. Open questions

Appendix A — inodes, df, and the "2.1G ifree" question

Background for reading the live root's storage state, and a units gotcha worth pinning down because it looks alarming and isn't.

A.1 What an inode is

An inode ("index node") is the structure that stores everything about one file except its name: type, permissions, owner, timestamps, size, link count, and pointers to the data blocks. The name lives separately, in the directory entry that points at the inode. Every file, directory, and symlink consumes exactly one inode — and the (st_dev, st_ino) pair from that inode is the very identity clang keys include-once on (§2.1), which is what this whole plan is about.

A filesystem therefore has two independent capacity limits, and either can be exhausted alone:

A million tiny files can use every inode while gigabytes of bytes sit free; one huge file can fill the bytes with inodes to spare.

A.2 The inode columns — and the units gotcha

A.3 Sample from the live root

A.4 Why ~2.1G, and why it isn't the RAM number

Column	Meaning
`iused`	inodes in use (files that exist)
`ifree`	inodes still available
`%iused`	inodes used, as a percentage

2.1G is right around 2³¹ (≈ 2,147,483,648) — the signed-32-bit max. That's tmpfs's default way of saying "don't meaningfully cap by inode count." It is a default ceiling, not anything derived from physical memory.

The limit that is memory-derived is the byte size: tmpfs is RAM-backed, so FreeBSD defaults its Size to roughly available memory (≈ 6.7 G of an 8 GB box, the rest reserved for the kernel). The 6.1 G Avail on /cow is the real build budget, and it's the ceiling that actually binds — every file also costs memory, so you'd exhaust those 6.1 GB (file data + inode structures) long before creating 2 billion files. Inode count: effectively infinite. Byte size: the one to watch.

A.5 Reading the rest of the rows

Mount	Reading
`tmpfs → /cow`	Writable upper layer. Inode cap ≈ 2.1 G (≈ unlimited), byte cap 6.7 G (RAM-bound). All build output lands here.
`<above>:/cow → /`	The unionfs. `iused 143k` ≈ lower (115k) + upper (18k); `ifree 2.1G` comes from the writable tmpfs — the only layer you can create files on.
`/dev/md1.uzip → /rofs`	Read-only compressed lower layer. `%iused 92%` looks scary but is fine — a packed read-only image with exactly the files it needs; it never grows.
`/dev/md0 → /`	Tiny mfsroot boot scaffold — a traditional FS with inodes pre-allocated at `newfs` time (1022 total, 991 free), unlike tmpfs's dynamic allocation.
`iso9660`, `devfs`	Report `0` inodes — they don't expose a POSIX inode count to `df`.

Bottom line: 0% inodes used and 6.1 G of byte-space free — the live root is neither inode- nor space-constrained for builds.

A.6 Tie-back to the collision bug

The huge ifree count and the inode numbering are unrelated. tmpfs hands out inode numbers sequentially from low values as files are created (hence language.h = 25678, GSVersionMacros.h = 14323), which is exactly why they collide with the low-numbered system headers on the lower layer (§2.2). Having billions of free inodes does nothing to spread the numbers apart — only the high-bit layer tag in §5 does that.

NextBSD unionfs inode-collision fix — design plan Design / proposal

TL;DR

1. The bug, as observed

1.1 Evidence from the box

2. Root cause

2.1 clang identifies files by `(device, inode)`, not path

2.2 unionfs collapses `st_dev` and passes inode numbers through

2.3 The collision → silent skip

3. Why there is no toolchain-side fix

4. Prior art — Linux overlayfs `xino`

5. The fix — layer-tagged fileid in unionfs

5.1 Source layout & touch points

5.2 The helper

5.3 `unionfs_getattr()`

5.4 `unionfs_readdir()` — the hard half

6. Alternatives considered

7. Risks & implications of the fix

8. Testing & validation

8.1 Collision detector (run before & after)

8.2 Kernel regression test (deliberate overlap)

8.3 End-to-end

9. Implementation checklist & rollout

10. Open questions

Appendix A — inodes, `df`, and the "2.1G ifree" question

A.1 What an inode is

A.2 The inode columns — and the units gotcha

A.3 Sample from the live root

A.4 Why ~2.1G, and why it isn't the RAM number

A.5 Reading the rest of the rows

A.6 Tie-back to the collision bug

11. References

NextBSD unionfs inode-collision fix — design plan Design / proposal

TL;DR

1. The bug, as observed

1.1 Evidence from the box

2. Root cause

2.1 clang identifies files by (device, inode), not path

2.2 unionfs collapses st_dev and passes inode numbers through

2.3 The collision → silent skip

3. Why there is no toolchain-side fix

4. Prior art — Linux overlayfs xino

5. The fix — layer-tagged fileid in unionfs

5.1 Source layout & touch points

5.2 The helper

5.3 unionfs_getattr()

5.4 unionfs_readdir() — the hard half

6. Alternatives considered

7. Risks & implications of the fix

8. Testing & validation

8.1 Collision detector (run before & after)

8.2 Kernel regression test (deliberate overlap)

8.3 End-to-end

9. Implementation checklist & rollout

10. Open questions

Appendix A — inodes, df, and the "2.1G ifree" question

A.1 What an inode is

A.2 The inode columns — and the units gotcha

A.3 Sample from the live root

A.4 Why ~2.1G, and why it isn't the RAM number

A.5 Reading the rest of the rows

A.6 Tie-back to the collision bug

11. References

2.1 clang identifies files by `(device, inode)`, not path

2.2 unionfs collapses `st_dev` and passes inode numbers through

4. Prior art — Linux overlayfs `xino`

5.3 `unionfs_getattr()`

5.4 `unionfs_readdir()` — the hard half

Appendix A — inodes, `df`, and the "2.1G ifree" question