This ensures the tmpfiles resetup service has permissions
to create suid/sgid files, even if `DefaultRestrictSUIDSGID`
is set in system.conf. This is required, as tmpfiles
are used to e.g. set file permissions on the journal
directory.`DefaultRestrictSUIDSGID` is a new feature
coming in systemd 258 [1].
[1] https://github.com/systemd/systemd/pull/38126
Before this change, systemd-oomd startup was flaky at least with
either systemd-sysusers or userborn enabled. It would restart several
times until users were provisioned, so that it finally succeeded.
An alternative would be to use a DynamicUser which was my first
approach, before I discovered that upstream added the after statement
in Dec 2024[1]. DynamicUsers could have further
implications (sandboxing, etc), so we follow upstream here.
It's not clear to me we why Upstreams "After=systemd-sysusers.service"
doesn't show up on nixos-unstable systems (systemd v257.6).
Userborn is covered, as its unit is aliased to systemd-sysusers.service.
The following test succeeded after this change on x86_64-linux:
nix-build -A nixosTests.systemd-oomd
[1]: 36dd429680
When running with a xfs root partition and using systemd for stage 1
initrd, I noticed in journalctl that fsck.xfs always failed to execute.
The issue is that it is trying to use the below sh interpreter:
`#!/nix/store/xy4jjgw87sbgwylm5kn047d9gkbhsr9x-bash-5.2p37/bin/sh -f`
but the file does not exist in the initrd image.
/nix/store/xy4jjgw87sbgwylm5kn047d9gkbhsr9x-bash-5.2p37/bin/**bash**
exists since it gets pulled in by some package, but the rest of the
directory is not being pulled in.
boot/systemd/initrd.nix mentions that xfs_progs references the sh
interpreter and seems to explicitly try to address this by adding
${pkgs.bash}/bin to storePaths, but that's the wrong bash package.
Update the `storePaths` value to pull in `pkgs.bashNonInteractive`
rather than `pkgs.bash`.
`user-.slice` does not seem to exist, and the config we generate for it is
rejected by systemd (see `systemctl status user-.slice`).
I suppose that what was really intended here, was to configure
`user.slice`, which is the one that is documented in `man systemd.special`.
Reported-by: Ian Sollars <Ian.Sollars@brussels.msf.org>
The enable attribute of `boot.initrd.systemd.contents.<name>` was
ignored for building initrd storePaths. This resulted in building
derivations for the initrd even if it was disabled.
Found while testing a to build a nixos system with a kernel without
lodable modules[0]
[0]: https://github.com/NixOS/nixpkgs/pull/411792
We currently bypass systemd's switch-root logic by premounting
/sysroot/run. Make sure to propagate its sub-mounts with the recursive
flag, in accordance with the default switch-root logic.
This is required for creds at /run/credentials to survive the transition
from initrd -> host.
I was confused why I could not get an emergency access console despite setting systemd.emergencyMode=true.
Turns out there is another similar option `boot.initrd.systemd.emergencyAccess` that I should have used.
This is confusing and this change should make it more clear vie the docs of both these options.
Containers did not have *systemd-journald-audit.socket* in *additionalUpstreamSystemUnits*, which meant that the unit was not provided.
However the *wantedBy* was added without any additional check, therefore creating an empty unit with just the *WantedBy* on *boot.isContainer* machines.
This caused `systemd-analyze verify` to fail:
```text
systemd-journald-audit.socket: Unit has no Listen setting (ListenStream=, ListenDatagram=, ListenFIFO=, ...). Refusing.
systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket has a bad unit file setting.
systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket has a bad unit file setting.
```
The upstream unit already contains the following, which should make it safe to include regardless:
```ini
[Unit]
ConditionSecurity=audit
ConditionCapability=CAP_AUDIT_READ
```
For reference, this popped up in the context of #[360426](https://redirect.github.com/NixOS/nixpkgs/issues/360426) as well as #[407696](https://redirect.github.com/NixOS/nixpkgs/pull/407696).
Co-authored-by: Bruce Toll <4109762+tollb@users.noreply.github.com>
Signed-off-by: benaryorg <binary@benary.org>
systemd-repart can be configured to not automatically issue BLKDISCARD commands
to the underlying hardware.
This PR exposes this option in the repart module.
The systemd.tmpfiles.settings.<name>.<path>.<type>.argument option may
contain arbitrary strings. This could allow intentional or unintentional
introduction of new configuration lines.
The argument field cannot be quoted, C‐style \xNN escape sequences are
however permitted. By escaping whitespace and newline characters, the
issue can be mitigated.
Format all Nix files using the officially approved formatter,
making the CI check introduced in the previous commit succeed:
nix-build ci -A fmt.check
This is the next step of the of the [implementation](https://github.com/NixOS/nixfmt/issues/153)
of the accepted [RFC 166](https://github.com/NixOS/rfcs/pull/166).
This commit will lead to merge conflicts for a number of PRs,
up to an estimated ~1100 (~33%) among the PRs with activity in the past 2
months, but that should be lower than what it would be without the previous
[partial treewide format](https://github.com/NixOS/nixpkgs/pull/322537).
Merge conflicts caused by this commit can now automatically be resolved while rebasing using the
[auto-rebase script](8616af08d9/maintainers/scripts/auto-rebase).
If you run into any problems regarding any of this, please reach out to the
[formatting team](https://nixos.org/community/teams/formatting/) by
pinging @NixOS/nix-formatting.
Some upstream systemd units are conditionally installed into the systemd
output, so we must make sure the feature that enables their installation
is enabled on our side prior to trying to use them.
Closes#381822
Apparently, I swapped `path` and `tmpfiles-type` in
2be50b1efe. Sorry about that 🫠
Also giving
`systemd.tmpfiles.settings.<config-name>.<path>.<tmpfiles-type>.type` a
better default in the manual than `‹name›`, i.e. `‹tmpfiles-type›` so
that it corresponds to the placeholders in the attribute path.
We default this option to null ; which is different
from upstream which defaults this to true.
Defaulting this to true leads to log-spam in /dev/kmesg
and thus in our opinion is a bad default https://github.com/systemd/systemd/issues/15324
Prior to this change a service failure would occur when this tmpfiles
service did not finish fast enough and receive a SIGTERM from systemd.
Additionally, `initrd-nixos-activation` is already ordered with
`After=initrd-switch-root.target`.
By default, systemd-repart refuses to act on empty disk devices, i.e.
those without any existing partition table for safety reasons.
This behaviour can be customized via the `--empty` flag, which we now
expose via the module system. This makes to partition empty disks
on first boot.
After final improvements to the official formatter implementation,
this commit now performs the first treewide reformat of Nix files using it.
This is part of the implementation of RFC 166.
Only "inactive" files are reformatted, meaning only files that
aren't being touched by any PR with activity in the past 2 months.
This is to avoid conflicts for PRs that might soon be merged.
Later we can do a full treewide reformat to get the rest,
which should not cause as many conflicts.
A CI check has already been running for some time to ensure that new and
already-formatted files are formatted, so the files being reformatted here
should also stay formatted.
This commit was automatically created and can be verified using
nix-build a08b3a4d19.tar.gz \
--argstr baseRev b32a094368
result/bin/apply-formatting $NIXPKGS_PATH
Now it's placed between initrd-switch-root.target and
initrd-switch-root.service, meaning it is truly the last thing to
happen before switch-root, as it should be.
The perl snippet as been added years ago. I assume the intention was to
remove the `## file: iwlwifi.conf` section up to the next `## file:`,
but as there is no file following, the snippet currently does nothing.
We should be fine to remove it.
Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Before this change, the hash of the etc metadata image was included in
the mount unit that's responsible for mounting this metadata image in the
initrd.
And because this metadata image changes with every change to the etc
contents, the initrd would be rebuild every time as well.
This can lead to a lot of rebuilds (especially when revision info is
included in /etc/os-release) and all these initrd archives use up a lot of
space on the ESP.
With this change, we instead include a symlink to the metadata image in the
top-level directory, in the same way as we already do for things like init and
prepare-root, and we deduce the store path from the init= kernel parameter,
in the same way as we already do to find the path to init and prepare-root.
Doing so avoids rebuilding the initrd all the time.
tpm2.target was functionally useless without these services and this
generator. When systemd-cryptsetup-generator creates
systemd-cryptsetup@.service units, they are ordered after
systemd-tpm2-setup-early.service, not tpm2.target. These services are
themselves ordered after tpm2.target.
Note: The systemd-tpm2-setup(-early) services will serve no *function*
under a normal NixOS system at the moment. Because of their
ConditionSecurity=measured-uki, they will always be skipped, unless
you are building an appliance with the system.build.uki feature. Thus,
these are enabled solely for their systemd unit ordering properties.
Implementation is now compatible with the option's .type already defined.
This allows us to pass `config.users.users.<user>.hashedPassword` even if this is null (the default).
Before:
true => access
false => no access
hash => access via password
null => eval error
After:
true => access
false => no access
hash => access via password
null => no access
This adds support for declaring tmpfiles rules exclusively for the
systemd initrd. Configuration is possible through the new option
`boot.initrd.systemd.tmpfiles.settings` that shares the same interface as
`systemd.tmpfiles.settings`.
I did intentionally not replicate the `rules` interface here, given that
the settings attribute set is more versatile than the list of strings
used for `rules`. This should also make it unnecessary to implement the
workaround from 1a68e21d47 again.
A self-contained `tmpfiles.d` directory is generated from the new initrd
settings and it is added to the initrd as a content path at
`/etc/tmpfiles.d`.
The stage-1 `systemd-tmpfiles-setup.service` is now altered to no longer
operate under the `/sysroot` prefix, because the `/sysroot` hierarchy
cannot be expected to be available when the default upstream service is
started.
To handle files under `/sysroot` a slightly altered version of the
upstream default service is introduced. This new unit
`systemd-tmpfiles-setup-sysroot.service` operates only under the
`/sysroot` prefix and it is ordered between `initrd-fs.target` and the
nixos activation.
Config related to tmpfiles was moved from initrd.nix to tmpfiles.nix.
In #327506, we stopped using `/sbin` in the `pathsToLink` of `initrdBinEnv`. This inadvertantly stopped including the `sbin` directory of the `initrdBin` packages, which meant that things like `mdadm`'s udev rules, which referred to binaries by their `sbin` paths, stopped working.
The purpose of #327506 was to fix the fact that `mount` was not calling mount helpers like `mount.ext4` unless they happened to be in `/sbin`. But this raised some questions for me, because I thought we set `managerEnvironment.PATH` to help util-linux find helpers for both `mount` and `fsck`. So I decided to look at how this works in stage 2 to figure it out, and it's a little cursed.
---
What I already knew is that we have [this](696a4e3758/nixos/modules/system/boot/systemd.nix (L624-L625))
```
# util-linux is needed for the main fsck utility wrapping the fs-specific ones
PATH = lib.makeBinPath (config.system.fsPackages ++ [cfg.package.util-linux]);
```
And I thought this was how `mount` finds the mount helpers. But if that were true, then `mount` should be finding helpers in stage 1 because of [this](696a4e3758/nixos/modules/system/boot/systemd/initrd.nix (L411))
```
managerEnvironment.PATH = "/bin";
```
Turns out, `mount` _actually_ finds helpers with [this configure flag](696a4e3758/pkgs/os-specific/linux/util-linux/default.nix (L59))
```
"--enable-fs-paths-default=/run/wrappers/bin:/run/current-system/sw/bin:/sbin"
```
Ok... so then why do we need the PATH? Because `fsck` has [this](a75c7a102e/disk-utils/fsck.c (L1659))
```
fsck_path = xstrdup(path && *path ? path : FSCK_DEFAULT_PATH);
```
(`path` is `getenv("PATH")`)
So, tl;dr, `mount` and `fsck` have completely unrelated search paths for their helper programs
For `mount`, we have to use a configure flag to point to `/run/current-system`, and for `fsck` we can just set PATH
---
So, for systemd stage 1, we *do* want to include packages' `sbin` paths, because of the `mdadm` problem. But for `mount`, we need helpers to be on the search path, and right now that means putting it somewhere in `/run/wrappers/bin:/run/current-system/sw/bin:/sbin`.
With the the Systemd-based initrd, systemd-journald is doing the logging.
One of Journald's Trusted Journal Fields is `_HOSTNAME` (systemd.journal-fields(7)).
Without explicitly setting the hostname via this file or the kernel cmdline, `localhost` is used and captured in the journal.
As a result, a boot's log references multiple hostnames.
With centralized log collection this breaks filtering (more so when logs from multiple Systemd-based initrds are streaming in simultaneously.
Fixes#318907.
Regardless of mutable or immutable users, systemd-sysupdate never
updates existing user records and thus will for example never change
passwords for you.
It only support initial passwords and now actively asserts agains other
paswords.
On Linux we cannot feasbibly generate users statically because we need
to take care to not change or re-use UIDs over the lifetime of a machine
(i.e. over multiple generations). This means we need the context of the
running machine.
Thus, stop creating users statically and instead generate them at
runtime irrespective of mutableUsers.
When /etc is immutable, the password files (e.g. /etc/passwd etc.) are
created in a separate directory (/var/lib/nixos/etc). /etc will be
pre-populated with symlinks to this separate directory.
Immutable users are now implemented by bind-mounting the password files
read-only onto themselves and only briefly re-mounting them writable to
re-execute sysusers. The biggest limitation of this design is that you
now need to manually unmount this bind mount to change passwords because
sysusers cannot change passwords for you. This shouldn't be too much of
an issue because system users should only rarely need to change their
passwords.
systemd-sysusers cannot create normal users (i.e. with a UID > 1000).
Thus we stop trying an explitily only use systemd-sysusers when there
are no normal users on the system (e.g. appliances).