Silences 2 warning messages that appear when using the systemd initrd:
1. "System tainted (var-run-bad)": occurs because `/var/run` isn't a
symlink to `/run`. Fixed by making /run and linking /var/run to it.
2. "Failed to make /usr a mountpoint": occurs because ProtectSystem
defaults to true in the initrd, which makes systemd try to remount
`/usr` as read-only, which doesn't exist in the initrd. Fixed by
linking `/usr/bin` and `/usr/sbin` to the initrd bin directories.
Also moves the `/tmp` creation from the initrd module to make-initrd-ng,
to avoid making an unnecessary `/tmp/.keep`, saving a store path and a
few bytes in the initrd image.
When a disposition is not set in a user record, systemd determines user
disposition depending on the range the user's UID falls in. For system
users with UIDs above 1000, this will cause them to be incorrectly
identified as "regular" users.
This will cause `userctl` to report the user as a regular user, and more
importantly, `systemd-homed` will not run the first boot user creation
flow, as regular users are already present on the machine (when they are
really system users).
The most common source of high UID system users will undoubtedly be Nix
build users, so the warning provides additional guidance on how to
remove them or adjust their IDs to be within the system range.
The warning is shown only when userdbd/homed is enabled, and the option
to hide the warning is deliberately hidden, to ensure users will have to
read and acknowledge the warning before proceeding, as otherwise users
could end up deploying an OS with no users and no way of creating one
due to the first boot flow being skipped.
Allow building a systemd initrd with a kernel that does not have
modules support enabled (`CONFIG_MODULES=n`), by removing the
assertion and only include the modulesClosure, kmod and support files
if MODULES is enabled or unset in the kernel.
PR #431115 changed extraStructuredConfig to structuredExtraConfig to
follow the deprecation warning about `extraConfig`. However,
`extraStructuredConfig` was mentioned in several places in the docs that
weren't addressed. Also, using this would silently fail since the code
in question would still accept the old key.
This patch updates the docs accordingly and throws an error if the
code-path is reached and `extraStructuredConfig` is being used.
This option is made uncondiotional in systemd 258 [1].
Earlier, it defaulted to true on kernels newer than 4.15,
which applies to all supported nixos kernels.
This means removing the option does not change behavior.
[1] 29da53dde3
This brings our `run0` in line with the upstream defaults:
bcc73cafdb/src/run/systemd-run0.in
While working on `auditd`, i noticed differences in how `run0` behaves
in regard to `/proc/$pid/sessionid` and `/proc/$pid/loginuid`. Particularly,
both files were set to `4294967295`, the magic value denoting `unset`.
While the manual page says elevators such as sudo should not set the loginuid,
run0 is a bit of a special case: The unit spawned by it is not child of
the running user session, and as such there is no id to inherit.
`systemd` upstream uses `pam_loginuid`, and for consistency we should too.
Especially because it prevents a whole lot of pain when working with `auditd`.
As to pam mounts:
On nixos we enable those if they are globally enabled. Upstream does not.
Considering the password entered into polkit is usually not the user password
of the account which will own the unit, pam mount will fail for any partition
which requires a password. Thus it makes sense to also disable pam mounts
for our run0, it prevents unnecessary unexpected pain.
This ensures the tmpfiles resetup service has permissions
to create suid/sgid files, even if `DefaultRestrictSUIDSGID`
is set in system.conf. This is required, as tmpfiles
are used to e.g. set file permissions on the journal
directory.`DefaultRestrictSUIDSGID` is a new feature
coming in systemd 258 [1].
[1] https://github.com/systemd/systemd/pull/38126
Before this change, systemd-oomd startup was flaky at least with
either systemd-sysusers or userborn enabled. It would restart several
times until users were provisioned, so that it finally succeeded.
An alternative would be to use a DynamicUser which was my first
approach, before I discovered that upstream added the after statement
in Dec 2024[1]. DynamicUsers could have further
implications (sandboxing, etc), so we follow upstream here.
It's not clear to me we why Upstreams "After=systemd-sysusers.service"
doesn't show up on nixos-unstable systems (systemd v257.6).
Userborn is covered, as its unit is aliased to systemd-sysusers.service.
The following test succeeded after this change on x86_64-linux:
nix-build -A nixosTests.systemd-oomd
[1]: 36dd429680
When running with a xfs root partition and using systemd for stage 1
initrd, I noticed in journalctl that fsck.xfs always failed to execute.
The issue is that it is trying to use the below sh interpreter:
`#!/nix/store/xy4jjgw87sbgwylm5kn047d9gkbhsr9x-bash-5.2p37/bin/sh -f`
but the file does not exist in the initrd image.
/nix/store/xy4jjgw87sbgwylm5kn047d9gkbhsr9x-bash-5.2p37/bin/**bash**
exists since it gets pulled in by some package, but the rest of the
directory is not being pulled in.
boot/systemd/initrd.nix mentions that xfs_progs references the sh
interpreter and seems to explicitly try to address this by adding
${pkgs.bash}/bin to storePaths, but that's the wrong bash package.
Update the `storePaths` value to pull in `pkgs.bashNonInteractive`
rather than `pkgs.bash`.
Fixes#361592.
I was able to test this change by doing the following:
1. Create a file named “test-systemd-run0.nix” that contains this Nix
expression:
let
nixpkgs = /path/to/nixpkgs;
pkgs = import nixpkgs { };
in
pkgs.testers.runNixOSTest {
name = "test-systemd-run0";
nodes.machine = {
security.polkit.enable = true;
};
testScript = ''
start_all()
machine.succeed("run0 env")
'';
}
2. Replace “/path/to/nixpkgs” with the actual path to an actual copy of
Nixpkgs.
3. Run the integration test by running this command:
nix-build <path to test-systemd-run0.nix>
`user-.slice` does not seem to exist, and the config we generate for it is
rejected by systemd (see `systemctl status user-.slice`).
I suppose that what was really intended here, was to configure
`user.slice`, which is the one that is documented in `man systemd.special`.
Reported-by: Ian Sollars <Ian.Sollars@brussels.msf.org>
We should not remount all filesystem types since not all filesystems
are safe to remount and some (nfs) return errors if remounted with
certain mount options.
The enable attribute of `boot.initrd.systemd.contents.<name>` was
ignored for building initrd storePaths. This resulted in building
derivations for the initrd even if it was disabled.
Found while testing a to build a nixos system with a kernel without
lodable modules[0]
[0]: https://github.com/NixOS/nixpkgs/pull/411792
We currently bypass systemd's switch-root logic by premounting
/sysroot/run. Make sure to propagate its sub-mounts with the recursive
flag, in accordance with the default switch-root logic.
This is required for creds at /run/credentials to survive the transition
from initrd -> host.
I was confused why I could not get an emergency access console despite setting systemd.emergencyMode=true.
Turns out there is another similar option `boot.initrd.systemd.emergencyAccess` that I should have used.
This is confusing and this change should make it more clear vie the docs of both these options.
Containers did not have *systemd-journald-audit.socket* in *additionalUpstreamSystemUnits*, which meant that the unit was not provided.
However the *wantedBy* was added without any additional check, therefore creating an empty unit with just the *WantedBy* on *boot.isContainer* machines.
This caused `systemd-analyze verify` to fail:
```text
systemd-journald-audit.socket: Unit has no Listen setting (ListenStream=, ListenDatagram=, ListenFIFO=, ...). Refusing.
systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket has a bad unit file setting.
systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket has a bad unit file setting.
```
The upstream unit already contains the following, which should make it safe to include regardless:
```ini
[Unit]
ConditionSecurity=audit
ConditionCapability=CAP_AUDIT_READ
```
For reference, this popped up in the context of #[360426](https://redirect.github.com/NixOS/nixpkgs/issues/360426) as well as #[407696](https://redirect.github.com/NixOS/nixpkgs/pull/407696).
Co-authored-by: Bruce Toll <4109762+tollb@users.noreply.github.com>
Signed-off-by: benaryorg <binary@benary.org>
This is part of security-in-depth.
No suid binaries or devices should ever be in the nix store.
If they are, something is seriously wrong.
Disallowing this from a file system level should be non-breaking.
Thank you for making this change.
Unfortunately, and I take blame for this, this change to the module
system was not reviewed and approved by the module system maintainers.
I'm supportive of this change, but extending it on the staging-next
branch is not the right place.
This commit is also here to make sure that we don't run into conflicts
or other git trouble with the staging workflow.
Review:
It looks alright, but it didn't have tests yet, and it should be
considered in a broader context where the existence of this type
creates an incentive to be used in cases where the `<attr> = false;`
case is undesirable. I'd like to complement this with an type that
has `<attr> = {};` only.
My apologies for the lack of a timely and clear review. Often we
recommend to define the type outside the module system until
approved. This commit puts us back in that state.
attrNamesToTrue was introduced in 98652f9a90
After RFC-0125 implementation, Determinate Systems was pinged multiple
times to transfer the repository ownership of the tooling to a
vendor-neutral repository.
Unfortunately, this never manifested. Additionally, the leadership of
the NixOS project was too dysfunctional to deal with this sort of
problem. It might even still be the case up to this day.
Nonetheless, nixpkgs is about enabling end users to enact their own
policies. It would be better to live in a world where there is one
obvious choice of bootspec tooling, in the meantime, we can live in a
world where people can choose their bootspec tooling.
The Lix forge possess one fork of the Bootspec tooling:
https://git.lix.systems/lix-community/bootspec which will live its own
life from now on.
Change-Id: I00c4dd64e00b4c24f6641472902e7df60ed13b55
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
systemd-repart can be configured to not automatically issue BLKDISCARD commands
to the underlying hardware.
This PR exposes this option in the repart module.