Silences 2 warning messages that appear when using the systemd initrd:
1. "System tainted (var-run-bad)": occurs because `/var/run` isn't a
symlink to `/run`. Fixed by making /run and linking /var/run to it.
2. "Failed to make /usr a mountpoint": occurs because ProtectSystem
defaults to true in the initrd, which makes systemd try to remount
`/usr` as read-only, which doesn't exist in the initrd. Fixed by
linking `/usr/bin` and `/usr/sbin` to the initrd bin directories.
Also moves the `/tmp` creation from the initrd module to make-initrd-ng,
to avoid making an unnecessary `/tmp/.keep`, saving a store path and a
few bytes in the initrd image.
When a disposition is not set in a user record, systemd determines user
disposition depending on the range the user's UID falls in. For system
users with UIDs above 1000, this will cause them to be incorrectly
identified as "regular" users.
This will cause `userctl` to report the user as a regular user, and more
importantly, `systemd-homed` will not run the first boot user creation
flow, as regular users are already present on the machine (when they are
really system users).
The most common source of high UID system users will undoubtedly be Nix
build users, so the warning provides additional guidance on how to
remove them or adjust their IDs to be within the system range.
The warning is shown only when userdbd/homed is enabled, and the option
to hide the warning is deliberately hidden, to ensure users will have to
read and acknowledge the warning before proceeding, as otherwise users
could end up deploying an OS with no users and no way of creating one
due to the first boot flow being skipped.
Allow building a systemd initrd with a kernel that does not have
modules support enabled (`CONFIG_MODULES=n`), by removing the
assertion and only include the modulesClosure, kmod and support files
if MODULES is enabled or unset in the kernel.
This ensures the tmpfiles resetup service has permissions
to create suid/sgid files, even if `DefaultRestrictSUIDSGID`
is set in system.conf. This is required, as tmpfiles
are used to e.g. set file permissions on the journal
directory.`DefaultRestrictSUIDSGID` is a new feature
coming in systemd 258 [1].
[1] https://github.com/systemd/systemd/pull/38126
Before this change, systemd-oomd startup was flaky at least with
either systemd-sysusers or userborn enabled. It would restart several
times until users were provisioned, so that it finally succeeded.
An alternative would be to use a DynamicUser which was my first
approach, before I discovered that upstream added the after statement
in Dec 2024[1]. DynamicUsers could have further
implications (sandboxing, etc), so we follow upstream here.
It's not clear to me we why Upstreams "After=systemd-sysusers.service"
doesn't show up on nixos-unstable systems (systemd v257.6).
Userborn is covered, as its unit is aliased to systemd-sysusers.service.
The following test succeeded after this change on x86_64-linux:
nix-build -A nixosTests.systemd-oomd
[1]: 36dd429680
When running with a xfs root partition and using systemd for stage 1
initrd, I noticed in journalctl that fsck.xfs always failed to execute.
The issue is that it is trying to use the below sh interpreter:
`#!/nix/store/xy4jjgw87sbgwylm5kn047d9gkbhsr9x-bash-5.2p37/bin/sh -f`
but the file does not exist in the initrd image.
/nix/store/xy4jjgw87sbgwylm5kn047d9gkbhsr9x-bash-5.2p37/bin/**bash**
exists since it gets pulled in by some package, but the rest of the
directory is not being pulled in.
boot/systemd/initrd.nix mentions that xfs_progs references the sh
interpreter and seems to explicitly try to address this by adding
${pkgs.bash}/bin to storePaths, but that's the wrong bash package.
Update the `storePaths` value to pull in `pkgs.bashNonInteractive`
rather than `pkgs.bash`.
`user-.slice` does not seem to exist, and the config we generate for it is
rejected by systemd (see `systemctl status user-.slice`).
I suppose that what was really intended here, was to configure
`user.slice`, which is the one that is documented in `man systemd.special`.
Reported-by: Ian Sollars <Ian.Sollars@brussels.msf.org>
The enable attribute of `boot.initrd.systemd.contents.<name>` was
ignored for building initrd storePaths. This resulted in building
derivations for the initrd even if it was disabled.
Found while testing a to build a nixos system with a kernel without
lodable modules[0]
[0]: https://github.com/NixOS/nixpkgs/pull/411792
We currently bypass systemd's switch-root logic by premounting
/sysroot/run. Make sure to propagate its sub-mounts with the recursive
flag, in accordance with the default switch-root logic.
This is required for creds at /run/credentials to survive the transition
from initrd -> host.
I was confused why I could not get an emergency access console despite setting systemd.emergencyMode=true.
Turns out there is another similar option `boot.initrd.systemd.emergencyAccess` that I should have used.
This is confusing and this change should make it more clear vie the docs of both these options.
Containers did not have *systemd-journald-audit.socket* in *additionalUpstreamSystemUnits*, which meant that the unit was not provided.
However the *wantedBy* was added without any additional check, therefore creating an empty unit with just the *WantedBy* on *boot.isContainer* machines.
This caused `systemd-analyze verify` to fail:
```text
systemd-journald-audit.socket: Unit has no Listen setting (ListenStream=, ListenDatagram=, ListenFIFO=, ...). Refusing.
systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket has a bad unit file setting.
systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket has a bad unit file setting.
```
The upstream unit already contains the following, which should make it safe to include regardless:
```ini
[Unit]
ConditionSecurity=audit
ConditionCapability=CAP_AUDIT_READ
```
For reference, this popped up in the context of #[360426](https://redirect.github.com/NixOS/nixpkgs/issues/360426) as well as #[407696](https://redirect.github.com/NixOS/nixpkgs/pull/407696).
Co-authored-by: Bruce Toll <4109762+tollb@users.noreply.github.com>
Signed-off-by: benaryorg <binary@benary.org>
systemd-repart can be configured to not automatically issue BLKDISCARD commands
to the underlying hardware.
This PR exposes this option in the repart module.