This is analogous to #70447 and #76487.
These are all needed to attach a container to the default bridge
network, without which the final line of the following script fails with
the error for each respective kernel module listed below.
```sh
lxc storage create foo dir
lxc launch -s foo ubuntu:trusty bar
lxc network attach lxdbr0 bar
```
veth
----
> Error: Failed to start device 'lxdbr0': Failed to create the veth interfaces vethefbc3cd6 and vetha4abbcbc: Failed to run: ip link add dev vethefbc3cd6 type veth peer name vetha4abbcbc: RTNETLINK answers: Operation not supported
iptable_mangle
--------------
> lvl=eror msg="Failed to bring up network" err="Failed to list ipv4 rules for LXD network lxdbr0 (table mangle)" name=lxdbr0
xt_comment
----------
> lvl=error msg="Failed to bring up network" err="Failed to run: iptables -w -t filter -I INPUT -i lxdbr0 -p udp --dport 67 -j ACCEPT -m comment --comment generated for LXD network lxdbr0: iptables v1.8.4 (legacy): Couldn't load match `comment':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information." name=lxdbr0
xt_MASQUERADE
-------------
> vl=eror msg="Failed to bring up network" err="Failed to run: iptables -w -t nat -I POSTROUTING -s 10.0.107.0/24 ! -d 10.0.107.0/24 -j MASQUERADE -m comment --comment generated for LXD network lxdbr0: iptables v1.8.4 (legacy): Couldn't load target `MASQUERADE':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information." name=lxdbr0
systemd-nspawn can react to SIGTERM and send a shutdown signal to the container
init process. use that instead of going through dbus and machined to request
nspawn sending the signal, since during host shutdown machined or dbus may have
gone away by the point a container unit is stopped.
to solve the issue that a container that is still starting cannot be stopped
cleanly we must also handle this signal in containerInit/stage-2.
At the moment, it's not possible to override the libvirtd package used
without supplying a nixpkgs overlay. Adding a package option makes
libvirtd more consistent and allows enabling e.g. ceph and iSCSI support
more easily.
Xen is now enabled unconditionally on kernels that support it, so the
xen_dom0 feature doesn't do anything. The isXen attribute will now
produce a deprecation warning and unconditionally return true.
Passing in a custom value for isXen is no longer supported.
This dependency has been added in 65eae4d, when NixOS switched to
systemd, as a substitute for the previous udevtrigger and hasn't been
touched since. It's probably unneeded as the upstream unit[1] doesn't
do it and I haven't found any mention of any problem in NixOS or the
upstream issue trackers.
[1]: https://gitlab.com/libvirt/libvirt/-/blob/master/src/remote/libvirtd.service.in
Related to #85746 which addresses documentation issue,
digging deeper for a reason why this was disabled
was simply because it wasn't working which is not the case anymore.
- Actually use the zfsSupport option
- Add documentation URI to lxd.service
- Add lxd.socket to enable socket activatation
- Add proper dependencies and remove systemd-udev-settle from lxd.service
- Set up /var/lib/lxc/rootfs using systemd.tmpfiles
- Configure safe start and shutdown of lxd.service
- Configure restart on failures of lxd.service
Launching a container with a private network requires creating a
dedicated networking interface for it; name of that interface is derived
from the container name itself - e.g. a container named `foo` gets
attached to an interface named `ve-foo`.
An interface name can span up to IFNAMSIZ characters, which means that a
container name must contain at most IFNAMSIZ - 3 - 1 = 11 characters;
it's a limit that we validate using a build-time assertion.
This limit has been upgraded with Linux 5.8, as it allows for an
interface to contain a so-called altname, which can be much longer,
while remaining treated as a first-class citizen.
Since altnames have been supported natively by systemd for a while now,
due diligence on our side ends with dropping the name-assertion on newer
kernels.
This commit closes#38509.
systemd/systemd#14467systemd/systemd#17220https://lwn.net/Articles/794289/
Ensure that the HyperV keyboard driver is available in the early
stages of the boot process. This allows the user to enter a disk
encryption passphrase or repair a boot problem in an interactive
shell.
We now set the hooks dir correctly if the OCI hook is enabled. CRI-O
supports this specific hook from v1.20.0.
Signed-off-by: Sascha Grunert <mail@saschagrunert.de>
Reintroduce the `fetch-ssh-keys` service so that GCE images that work
with NixOps can once again be built. Also, reformat the code a bit.
The service was removed in 88570538b3,
likely due to a comment saying it should be removed. It was still
needed for images to work with NixOps, however, and probably needed to
be replaced or rewritten rather than removed.
The socketActivation option was removed, but later on socket activation
was added back without the option to disable it. The description now reflects
that socket activation is used unconditionally in the current setup.
Since the introduction of option `containers.<name>.pkgs`, the
`nixpkgs.*` options (including `nixpkgs.pkgs`, `nixpkgs.config`, ...) were always
ignored in container configs, which broke existing containers.
This was due to `containers.<name>.pkgs` having two separate effects:
(1) It sets the source for the modules that are used to evaluate the container.
(2) It sets the `pkgs` arg (`_module.args.pkgs`) that is used inside the container
modules.
This happens even when the default value of `containers.<name>.pkgs` is unchanged, in which
case the container `pkgs` arg is set to the pkgs of the host system.
Previously, the `pkgs` arg was determined by the `containers.<name>.config.nixpkgs.*` options.
This commit reverts the breaking change (2) while adding a backwards-compatible way to achieve (1).
It removes option `pkgs` and adds option `nixpkgs` which implements (1).
Existing users of `pkgs` are informed by an error message to use option
`nixpkgs` or to achieve only (2) by setting option `containers.<name>.config.nixpkgs.pkgs`.
It's been 8.5 years since NixOS used mingetty, but the option was
never renamed (despite the file definining the module being renamed in
9f5051b76c ("Rename mingetty module to agetty")).
I've chosen to rename it to services.getty here, rather than
services.agetty, because getty is implemantation-neutral and also the
name of the unit that is generated.
... build-vm-with-bootloader" for EFI systems
This reverts commit 20257280d9, reversing
changes made to 926a1b2094.
It broke nixosTests.installer.simpleUefiSystemdBoot
and right now channel is lagging behing for two weeks.
`nixos-rebuild build-vm-with-bootloader` currently fails with the
default NixOS EFI configuration:
$ cat >configuration.nix <<EOF
{
fileSystems."/".device = "/dev/sda1";
boot.loader.systemd-boot.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
}
EOF
$ nixos-rebuild build-vm-with-bootloader -I nixos-config=$PWD/configuration.nix -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/nixos-20.09.tar.gz
[...]
insmod: ERROR: could not insert module /nix/store/1ibmgfr13r8b6xyn4f0wj115819f359c-linux-5.4.83/lib/modules/5.4.83/kernel/fs/efivarfs/efivarfs.ko.xz: No such device
mount: /sys/firmware/efi/efivars: mount point does not exist.
[ 1.908328] reboot: Power down
builder for '/nix/store/dx2ycclyknvibrskwmii42sgyalagjxa-nixos-boot-disk.drv' failed with exit code 32
[...]
Fix it by setting virtualisation.useEFIBoot = true in qemu-vm.nix, when
efi is needed.
And remove the now unneeded configuration in
./nixos/tests/systemd-boot.nix, since it's handled globally.
Before:
* release-20.03: successful build, unsuccessful run
* release-20.09 (and master): unsuccessful build
After:
* Successful build and run.
Fixes https://github.com/NixOS/nixpkgs/issues/107255
Fixes that `containers.<name>.extraVeths.<name>` configuration was not
always applied.
When configuring `containers.<name>.extraVeths.<name>` and not
configuring one of `containers.<name>.localAddress`, `.localAddress6`,
`.hostAddress`, `.hostAddress6` or `.hostBridge` the veth was created,
but otherwise no configuration (i.e. no ip) was applied.
nixos-container always configures the primary veth (when `.localAddress`
or `.hostAddress` is set) to be the containers default gateway, so
this fix is required to create a veth in containers that use a different
default gateway.
To test this patch configure the following container and check if the
addresses are applied:
```
containers.testveth = {
extraVeths.testveth = {
hostAddress = "192.168.13.2";
localAddress = "192.168.13.1";
};
config = {...}:{};
};
```
The metadata fetcher scripts run each time an instance starts, and it
is not safe to assume that responses from the instance metadata
service (IMDS) will be as they were on first boot.
Example: an EC2 instance can have its user data changed while
the instance is stopped. When the instance is restarted, we want to
see the new user data applied.
According to Freenode's ##AWS, the metadata server can sometimes
take a few moments to get its shoes on, and the very first boot
of a machine can see failed requests for a few moments.
AWS's metadata service has two versions. Version 1 allowed plain HTTP
requests to get metadata. However, this was frequently abused when a
user could trick an AWS-hosted server in to proxying requests to the
metadata service. Since the metadata service is frequently used to
generate AWS access keys, this is pretty gnarly. Version two is
identical except it requires the caller to request a token and provide
it on each request.
Today, starting a NixOS AMI in EC2 where the metadata service is
configured to only allow v2 requests fails: the user's SSH key is not
placed, and configuration provided by the user-data is not applied.
The server is useless. This patch addresses that.
Note the dependency on curl is not a joyful one, and it expand the
initrd by 30M. However, see the added comment for more information
about why this is needed. Note the idea of using `echo` and `nc` are
laughable. Don't do that.
See https://www.redhat.com/sysadmin/fedora-31-control-group-v2 for
details on why this is desirable, and how it impacts containers.
Users that need to keep using the old cgroup hierarchy can re-enable it
by setting `systemd.unifiedCgroupHierarchy` to `false`.
Well-known candidates not supporting that hierarchy, like docker and
hidepid=… will disable it automatically.
Fixes#73800
This reverts commit fb6d63f3fd.
I really hope this finally fixes#99236: evaluation on Hydra.
This time I really did check basically the same commit on Hydra:
https://hydra.nixos.org/eval/1618011
Right now I don't have energy to find what exactly is wrong in the
commit, and it doesn't seem important in comparison to nixos-unstable
channel being stuck on a commit over one week old.
This adds the pinns path to the configuration let CRI-O start properly.
We also change the configuration to the new drop-in syntax.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
The toplevel derivations of systems that have `networking.hostName`
set to `""` (because they want their hostname to be set by DHCP) used
to be all named
`nixos-system-unnamed-${config.system.nixos.label}`.
This makes them hard to distinguish.
A similar problem existed in NixOS tests where `vmName` is used in the
`testScript` to refer to the VM. It defaulted to the
`networking.hostName` which when set to `""` won't allow you to refer
to the machine from the `testScript`.
This commit makes the `system.name` configurable. It still defaults to:
```
if config.networking.hostName == ""
then "unnamed"
else config.networking.hostName;
```
but in case `networking.hostName` needs to be to `""` the
`system.name` can be set to a distinguishable name.
This is analogous to #70447.
With security.lockKernelModules=true, docker commands result in the following
error without at least loading veth:
$ docker run hello-world
/nix/store/mr50kaan2vs4gc40ymwncb2vci25aq7z-docker-19.03.2/libexec/docker/docker: Error response from daemon: failed to create endpoint epic_kare on network bridge: failed to add the host (veth8b381f3) <=> sandbox (veth348e197) pair interfaces: operation not supported.
ERRO[0003] error waiting for container: context canceled
This is required by (among others) Podman to run containers in rootless mode.
Other distributions such as Fedora and Ubuntu already set up these mappings.
The scheme with a start UID/GID offset starting at 100000 and increasing in 65536 increments is copied from Fedora.
Per upstream:
> libvirtd-tcp.socket - the unit file corresponding to the TCP 16509
> port for non-TLS remote access. This socket should not be configured
> to start on boot until the administrator has configured a suitable
> authentication mechanism.
Without this, systemd-boot does not add an EFI boot entry for itself.
The reason it worked before this fix is because it would fall back to
the default installed \EFI\BOOT\BOOTX64.EFI
boot.loader.grub.device` was hardcoded to `bootDevice`, which is
wrong, because that's the device for `/`, and with `useBootLoader`
the boot loader is not on that device.
This bug probably came into existence because of bad naming;
`virtualisation.bootDevice` has description
"The disk to be used for the root filesystem", which is very confusing;
it should be `.rootDevice` then!
Unfortunately, the description is right and the attribute name is wrong,
so it is not easy to change this without deprecation.
This commit ensures that even if you use `useBootLoader` and
`diskInterface == "scsi"`, the created VM can boot through, and can run
`nixos-rebuild afterwards.
It also adds extra commentary to explain what's going on in this module
in general in relation to `useBootLoader`.
VMSGVA is recommended by virtualbox for Linux clients.
Compared to VBoxVGA and VBoxSVGA it also supports 3D acceleration.
Adding the driver makes nixos work with all three supported graphics card
types.
xchg is advertised as a bidirectional exchange dir, but file content
transfer from host to VM fails due to caching:
If a file is read in the VM and then modified on the host, subsequent
re-reads in the VM can yield old, cached data.
This is caused by the use of 9p's cache=loose mode that is explicitly
meant for read-only mounts.
9p doesn't provide any suitable cache modes, so fix this by disabling
caching.
Also, remove a now unnecessary sync in the test driver.
- Update the default pause image
- Set the cgroup manager to systemd
- Enable `manage_ns_lifecycle` instead of the deprecated
`manage_network_ns_lifecycle` option
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
This follows upstreams change in documentation. While the `[DHCP]`
section might still work it is undocumented and we should probably not
be using it anymore. Users can just upgrade to the new option without
much hassle.
I had to create a bit of custom module deprecation code since the usual
approach doesn't support wildcards in the path.
What's happening now is that both cri-o and podman are creating
/etc/containers/policy.json.
By splitting out the creation of configuration files we can make the
podman module leaner & compose better with other container software.
Many options define their example to be a Nix value without using
literalExample. This sometimes gets rendered incorrectly in the manual,
causing confusion like in https://github.com/NixOS/nixpkgs/issues/25516
This fixes it by using literalExample for such options. The list of
option to fix was determined with this expression:
let
nixos = import ./nixos { configuration = {}; };
lib = import ./lib;
valid = d: {
# escapeNixIdentifier from https://github.com/NixOS/nixpkgs/pull/82461
set = lib.all (n: lib.strings.escapeNixIdentifier n == n) (lib.attrNames d) && lib.all (v: valid v) (lib.attrValues d);
list = lib.all (v: valid v) d;
}.${builtins.typeOf d} or true;
optionList = lib.optionAttrSetToDocList nixos.options;
in map (opt: {
file = lib.elemAt opt.declarations 0;
loc = lib.options.showOption opt.loc;
}) (lib.filter (opt: if opt ? example then ! valid opt.example else false) optionList)
which when evaluated will output all options that use a Nix identifier
that would need escaping as an attribute name.
extraModprobeConfig could be applied too late i.e. if the driver has been
loaded in initrd, while the harddrive is still encrypted.
Using a kernelParams works in all cases however.
This commit fixes#76620. It moves ExecStartPre and ExecStopPost to
preStart and postStop, as these options are composable. It thus allows
adding additional initialisation scripts or cleanup scripts to the systemd
unit of the docker container.
This option allows the user to control whether or not the docker container is
automatically started on boot. The previous default behavior (true) is preserved
NixOS has `virtualisation.docker.autoPrune.enable` for this
functionality; we should not do it every time a container starts up.
(also, some trivial documentation fixes)
- the `imageFile` option allows to load an image from a derivation
- the `dependsOn` option can be used to specify dependencies between container systemd units.
Co-authored-by: Christian Höppner <mkaito@users.noreply.github.com>
Previously, we were storing the leader pid in a runtime file and
signalled SIGRTMIN+4 manually.
In systemd 219, the `machinectl poweroff` command was introduced, which
does that for us.
The missing `\n` in the printf format string prevented multiple channels from
being logged.
The missing `nixpkgs=` in the `NIX_PATH` prevented `nixos-rebuild` from working
if the system configuration has any reference to `nixpkgs`.
Additionally:
* Use process substitution instead of piping printf to avoid creating a subshell.
* Set an empty `IFS` to avoid word splitting.
* Add the `-r` flag to `read` to avoid mangling backslashes.
Currently, LXD always use pkgs.zfs, even if boot.zfs.enableUnstable is set. This
change provides the option to change the LXC, LXD and ZFS packages, and
determines the default ZFS package based on zfs.enableUnstable.
Systemd dependencies for scripted mode
were refactored according to analysis in #34586.
networking.vswitches can now be used with systemd-networkd,
although they are not supported by the daemon, a nixos receipe
creates the switch and attached required interfaces (just like
the scripted version).
Vlans and internal interfaces are implemented following the
template format i.e. each interface is
described using an attributeSet (vlan and type at the moment).
If vlan is present, then interface is added to the vswitch with
given tag (access mode). Type internal enabled vswitch to create
interfaces (see openvswitch docs).
Added configuration for configuring supported openFlow version on
the vswitch
This commit is a split from the original PR #35127.
This makes ~2.5x speed up of an empty container instantiate, hence reduces
rebuild time of system with many declarative containers.
Note that this doesn't affect production systems much, becaseu those most
likely already include `minimal.nix` profile.
A centralized list for these renames is not good because:
- It breaks disabledModules for modules that have a rename defined
- Adding/removing renames for a module means having to find them in the
central file
- Merge conflicts due to multiple people editing the central file
As part of the networking.* name space cleanup, connman should be moved
to services.connman. The same will happen for example with
networkmanager in a separate PR.
While switching NixOS configurations with both
networking.useNetworkd = true;
virtualisation.virtualbox.host.enable;
You often end up waiting for systemd-networkd-wait-online.service.
This happens because the vboxnet0 device doesn't have a carrier until
virtualbox machines are started, so networkd gets stuck in
"Configuring":
⇒ networkctl list
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 wlp2s0 wlan routable unmanaged
3 vboxnet0 ether no-carrier configuring
This updates the NixOS virtualbox host module to include a
RequiredForOnline=no statement in the generated 40-vboxnet0.network
file, so networkd doesn't consider it necessary for
systemd-networkd-wait-online.service to finish.
List all modules that *may* be required depending on individual container
configurations; don't expect that further modules can be loaded after boot.
Fixes https://github.com/NixOS/nixpkgs/issues/38676
Openvswitch was upgraded to the latest
stable version (currenty 2.12.0). This remove ovs-monitor-ipsec
commands.
LTS version is still available using
`config.virtualisation.vswitch.package = pkgs.openvswitch-lts`
it has been upgraded to 2.5.6.
This commit is a split from the original PR #35127.
This fixes the warning being emitted by nixos-rebuild switch:
building Nix...
building the system configuration...
trace: warning: types.string is deprecated because it quietly concatenates strings
It started emitting a warning in #66346.
Since https://github.com/NixOS/nixpkgs/pull/61321, local-fs.target is
part of sysinit.target again, meaning units without
DefaultDependencies=no will automatically depend on it, and the manual
set dependencies can be dropped.
With local-fs.target part of sysinit.target
(https://github.com/NixOS/nixpkgs/pull/61321), we don't need to add it
explicitly to certain units anymore, and can change dependencies like
they are in other distros (I picked from Google's official CentOS 7
image here).
Like them, use StandardOutput=journal+console to pipe google-*.service
output to the serial console as well.
This adds a new ``onBoot`` option that allows specifying the action taken on
guests when the host boots. Specifying "start" ensures all guests that were
running prior to shutdown are started, regardless of their autostart settings.
Specifying "ignore" will make libvirtd ignore such guests. Any guest marked as
autostart will still be automatically started by libvirtd.
systemd provides two sysctl snippets, 50-coredump.conf and
50-default.conf.
These enable:
- Loose reverse path filtering
- Source route filtering
- `fq_codel` as a packet scheduler (this helps to fight bufferbloat)
This also configures the kernel to pass coredumps to `systemd-coredump`.
These sysctl snippets can be found in `/etc/sysctl.d/50-*.conf`,
and overridden via `boot.kernel.sysctl`
(which will place the parameters in `/etc/sysctl.d/60-nixos.conf`.
Let's start using these, like other distros already do for quite some
time, and remove those duplicate `boot.kernel.sysctl` options we
previously did set.
In the case of rp_filter (which systemd would set to 2 (loose)), make
our overrides to "1" more explicit.
We might be inside a NixOS container on a non-NixOS host, so instead of not
running at all inside a container, check if the nix-daemon socket is writable as
it will tell us if the store is managed from here or outside.
Fixes#63578
See https://github.com/NixOS/nixpkgs/issues/15747. Previously this module was called `<unknown-file>`
in error messages, now it is called a bit more close to real:
```
module at /home/danbst/dev/nixpkgs/nixos/modules/virtualisation/containers.nix:470
```
Fix#61859.
Assertion fails when a Google Compute Engine image is built, because
now choices of filesystem types are restricted to `f2fs` and `ext` family if
auto-resizing is enabled.
This change will pin the filesystem used on such an image to be `ext4` for now.
A new internal option `hardware.opengl.setLdLibraryPath` is added which controls if `LD_LIBRARY_PATH` should be set to `/run/opengl-driver(-32)/lib`. It is false by default and is meant to be set to true by any driver which requires it. If this option is false, then `opengl.nix` and `xserver.nix` will not set `LD_LIBRARY_PATH`.
Currently Mesa and NVidia drivers don't set `setLdLibraryPath` because they work with libglvnd and do not override libraries, while `amdgpu-pro`, `ati` and `parallels-guest` set it to true (the former two really need it, the last one doesn't build so is presumed to).
Additionally, the `libPath` attribute within entries of `services.xserver.drivers` is removed. This made `xserver.nix` add the driver path directly to the `LD_LIBRARY_PATH` for the display manager (including X server). Not only is it redundant when the driver is added to `hardware.opengl.package` (assuming that `hardware.opengl.enable` is true), in fact all current drivers except `ati` set it incorrectly to the package path instead of package/lib.
This removal of `LD_LIBRARY_PATH` could break certain packages using CUDA, but only those that themselves load `libcuda` or other NVidia driver libraries using `dlopen` (not if they just use `cudatoolkit`). A few have already been fixed but it is practically impossible to test all because most packages using CUDA are libraries/frameworks without a simple way to test.
Fixes#11434 if only Mesa or NVidia graphics drivers are used.
Quite some fixing was needed to get this to work.
Changes in VirtualBox and additions:
- VirtualBox is no longer officially supported on 32-bit hosts so i686-linux is removed from platforms
for VirtualBox and the extension pack. 32-bit additions still work.
- There was a refactoring of kernel module makefiles and two resulting bugs affected us which had to be patched.
These bugs were reported to the bug tracker (see comments near patches).
- The Qt5X11Extras makefile patch broke. Fixed it to apply again, making the libraries logic simpler
and more correct (it just uses a different base path instead of always linking to Qt5X11Extras).
- Added a patch to remove "test1" and "test2" kernel messages due to forgotten debugging code.
- virtualbox-host NixOS module: the VirtualBoxVM executable should be setuid not VirtualBox.
This matches how the official installer sets it up.
- Additions: replaced a for loop for installing kernel modules with just a "make install",
which seems to work without any of the things done in the previous code.
- Additions: The package defined buildCommand which resulted in phases not running, including RUNPATH
stripping in fixupPhase, and installPhase was defined which was not even run. Fixed this by
refactoring using phases. Had to set dontStrip otherwise binaries were broken by stripping.
The libdbus path had to be added later in fixupPhase because it is used via dlopen not directly linked.
- Additions: Added zlib and libc to patchelf, otherwise runtime library errors result from some binaries.
For some reason the missing libc only manifested itself for mount.vboxsf when included in the initrd.
Changes in nixos/tests/virtualbox:
- Update the simple-gui test to send the right keys to start the VM. With VirtualBox 5
it was enough to just send "return", but with 6 the Tools thing may be selected by
default. Send "home" to reliably select Tools, "down" to move to the VM and "return"
to start it.
- Disable the VirtualBox UART by default because it causes a crash due to a regression
in VirtualBox (specific to software virtualization and serial port usage). It can
still be enabled using an option but there is an assert that KVM nested virtualization
is enabled, which works around the problem (see below).
- Add an option to enable nested KVM virtualization, allowing VirtualBox to use hardware
virtualization. This works around the UART problem and also allows using 64-bit
guests, but requires a kernel module parameter.
- Add an option to run 64-bit guests. Tested that the tests pass with that. As mentioned
this requires KVM nested virtualization.
Regression introduced by c94005358c.
The commit introduced declarative docker containers and subsequently
enables docker whenever any declarative docker containers are defined.
This is done via an option with type "attrsOf somesubmodule" and a check
on whether the attribute set is empty.
Unfortunately, the check was whether a *list* is empty rather than
wether an attribute set is empty, so "mkIf (cfg != [])" *always*
evaluates to true and thus subsequently enables docker by default:
$ nix-instantiate --eval nixos --arg configuration {} \
-A config.virtualisation.docker.enable
true
Fixing this is simply done by changing the check to "mkIf (cfg != {})".
Tested this by running the "docker-containers" NixOS test and it still
passes.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @benley, @danbst, @Infinisil, @nlewo
This otherwise does not eval `:tested` any more, which means no nixos
channel updates.
Regression comes from 0eb6d0735f (#57751)
which added an assertion stopping the use of `autoResize` when the
filesystem cannot be resized automatically.
* WIP: Run Docker containers as declarative systemd services
* PR feedback round 1
* docker-containers: add environment, ports, user, workdir options
* docker-containers: log-driver, string->str, line wrapping
* ExecStart instead of script wrapper, %n for container name
* PR feedback: better description and example formatting
* Fix docbook formatting (oops)
* Use a list of strings for ports, expand documentation
* docker-continers: add a simple nixos test
* waitUntilSucceeds to avoid potential weird async issues
* Don't enable docker daemon unless we actually need it
* PR feedback: leave ExecReload undefined
Since 34234dcb51, for resize2fs to be automatically included in
initrd, a filesystem needed for boot must be explicitly defined as an
ext* type filesystem.
Adds `virtualisation.qemu.drives` option to specify drives to be used by
qemu.
Also fix boot when `virtualisation.useBootLoader` is set to true. Since
the boot disk is second qemu doesn't boot on it. Added `bootindex=1` to
the boot disk device.
This allows the VM to provide a `configuration.nix` file to the VM.
The test doesn't work in sandbox because it needs Internet (however it
works interactively).
The Openstack metadata service exposes the EC2 API. We use the
existing `ec2.nix` module to configure the hostname and ssh keys of an
Openstack Instance.
A test checks the ssh server is well configured.
This is mainly to reduce the size of the image (700MB). Also,
declarative features provided by cloud-init are not really useful
since we would prefer to use our `configuration.nix` file instead.
According to systemd-nspawn(1), --network-bridge implies --network-veth,
and --port option is supported only when private networking is enabled.
Fixes#52417.
Use googleOsLogin for login instead.
This allows setting users.mutableUsers back to false, and to strip the
security.sudo.extraConfig.
security.sudo.enable is default anyhow, so we can remove that as well.
When privateNetwork is enabled, currently the container's interface name
is derived from the container name. However, there's a hard limit
on the size of interface names. To avoid conflicts and other issues,
we set a limit on the container name when privateNetwork is enabled.
Fixes#38509
Cloudstack images are simply using cloud-init. They are not headless
as a user usually have access to a console. Otherwise, the difference
with Openstack are mostly handled by cloud-init.
This is still some minor issues. Notably, there is no non-root user.
Other cloud images usually come with a user named after the
distribution and with sudo. Would it make sense for NixOS?
Cloudstack gives the user the ability to change the password.
Cloud-init support for this is imperfect and the set-passwords module
should be declared as `- [set-passwords, always]` for this to work. I
don't know if there is an easy way to "patch" default cloud-init
configuration. However, without a non-root user, this is of no use.
Similarly, hostname is usually set through cloud-init using
`set_hostname` and `update_hostname` modules. While the patch to
declare nixos to cloud-init contains some code to set hostname, the
previously mentioned modules are not enabled.
This module permits to preload Docker image in a VM in order to reduce
OIs on file copies. This module has to be only used in testing
environments, when the test requires several Docker images such as in
Kubernetes tests. In this case,
`virtualisation.dockerPreloader.images` can replace the
`services.kubernetes.kubelet.seedDockerImages` options.
The idea is to populate the /var/lib/docker directory by mounting qcow
files (we uses qcow file to avoid permission issues) that contain images.
For each image specified in
config.virtualisation.dockerPreloader.images:
1. The image is loaded by Docker in a VM
2. The resulting /var/lib/docker is written to a QCOW file
This set of QCOW files can then be used to populate the
/var/lib/docker:
1. Each QCOW is mounted in the VM
2. Symlink are created from these mount points to /var/lib/docker
3. A /var/lib/docker/image/overlay2/repositories.json file is generated
4. The docker daemon is started.
`services.virtualisation.libvirtd.onShutdown` was previously unused.
While suspending a domain on host shutdown is the default, this commit
makes it so domains can be shut down, also.
Previously, setting "privateNetwork = true" without specifying host and
local addresses would create unconfigured interfaces: ve-$INSTANCE on the host
and eth0 inside the container.
These changes is rebased part of the original PR #3021.
100GB breaks cptofs but 50GB is fine and benchmarks shows it takes the same time as building the demo VBox VM with a 10GB disk
+ enabled VM sound output by default
+ set USB controller in USB2.0 mode
+ add manifest file in the OVA as it allows integrity checking on imports
* Lets container@.service be activated by machines.target instead of
multi-user.target
According to the systemd manpages, all containers that are registered
by machinectl, should be inside machines.target for easy stopping
and starting container units altogether
* make sure container@.service and container.slice instances are
actually located in machine.slice
https://plus.google.com/112206451048767236518/posts/SYAueyXHeEX
See original commit: https://github.com/NixOS/systemd/commit/45d383a3b8
* Enable Cgroup delegation for nixos-containers
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
This is equivalent to enabling all accounting options on the systemd
process inside the system container. This means that systemd inside
the container is responsible for managing Cgroup resources for
unit files that enable accounting options inside. Without this
option, units that make use of cgroup features within system
containers might misbehave
See original commit: https://github.com/NixOS/systemd/commit/a931ad47a8
from the manpage:
Turns on delegation of further resource control partitioning to
processes of the unit. Units where this is enabled may create and
manage their own private subhierarchy of control groups below the
control group of the unit itself. For unprivileged services (i.e.
those using the User= setting) the unit's control group will be made
accessible to the relevant user. When enabled the service manager
will refrain from manipulating control groups or moving processes
below the unit's control group, so that a clear concept of ownership
is established: the control group tree above the unit's control
group (i.e. towards the root control group) is owned and managed by
the service manager of the host, while the control group tree below
the unit's control group is owned and managed by the unit itself.
Takes either a boolean argument or a list of control group
controller names. If true, delegation is turned on, and all
supported controllers are enabled for the unit, making them
available to the unit's processes for management. If false,
delegation is turned off entirely (and no additional controllers are
enabled). If set to a list of controllers, delegation is turned on,
and the specified controllers are enabled for the unit. Note that
additional controllers than the ones specified might be made
available as well, depending on configuration of the containing
slice unit or other units contained in it. Note that assigning the
empty string will enable delegation, but reset the list of
controllers, all assignments prior to this will have no effect.
Defaults to false.
Note that controller delegation to less privileged code is only safe
on the unified control group hierarchy. Accordingly, access to the
specified controllers will not be granted to unprivileged services
on the legacy hierarchy, even when requested.
The following controller names may be specified: cpu, cpuacct, io,
blkio, memory, devices, pids. Not all of these controllers are
available on all kernels however, and some are specific to the
unified hierarchy while others are specific to the legacy hierarchy.
Also note that the kernel might support further controllers, which
aren't covered here yet as delegation is either not supported at all
for them or not defined cleanly.
When logging into a container by using
nixos-container root-login
all nix-related commands in the container would fail, as they
tried to modify the nix db and nix store, which are mounted
read-only in the container. We want nixos-container to not
try to modify the nix store at all, but instead delegate
any build commands to the nix daemon of the host operating system.
This already works for non-root users inside a nixos-container,
as it doesn't 'own' the nix-store, and thus defaults
to talking to the daemon socket at /nix/var/nix/daemon-socket/,
which is bind-mounted to the host daemon-socket, causing all nix
commands to be delegated to the host.
However, when we are the root user inside the container, we have the
same uid as the nix store owner, eventhough it's not actually
the same root user (due to user namespaces). Nix gets confused,
and is convinced it's running in single-user mode, and tries
to modify the nix store directly instead.
By setting `NIX_REMOTE=daemon` in `/etc/profile`, we force nix
to operate in multi-user mode, so that it will talk to the host
daemon instead, which will modify the nix store for the container.
This fixes#40355
Several service definitions used `mkEnableOption` with text starting
with "Whether to", which produced funny option descriptions like
"Whether to enable Whether to run the rspamd daemon..".
This commit corrects this, and adds short descriptions of services
to affected service definitions.
This reverts commit f777d2b719.
cc #34409
This breaks evaluation of the tested job:
attribute 'diskInterface' missing, at /nix/store/5k9kk52bv6zsvsyyvpxhm8xmwyn2yjvx-source/pkgs/build-support/vm/default.nix:316:24
10G is not enough for a desktop installation, and resizing a Virtualbox disk image is a pain.
Let's increase the default disk size to 100G. It does not require more storage space, since the empty bits are left out.
And don't need to source the uevent files anymore either since $MAJOR
or $MINOR aren't used elsewhere.
[dezgeg: The reason these are no longer needed is that 0d27df280f
switched /tmp to a devtmpfs which automatically creates such device
nodes]
This reverts commit 095fe5b43d.
Pointless renames considered harmful. All they do is force people to
spend extra work updating their configs for no benefit, and hindering
the ability to switch between unstable and stable versions of NixOS.
Like, what was the value of having the "nixos." there? I mean, by
definition anything in a NixOS module has something to do with NixOS...
* nixos/virtualbox: Adds more options to virtualbox-image.nix
Previously you could only set the size of the disk.
This change adds the ability to change the amount of memory
that the image gets, along with the name / derivation name /
file name for the VM.
* Incorporates some review feedback
Regression introduced by d4468bedb5.
No systemd messages are shown anymore during VM test runs, which is not
very helpful if you want to find out about failures.
There is a bit of a conflict between testing and the change that
introduced the regression. While the mentioned commit makes sure that
the primary console is tty0 for virtualisation.graphics = false, our VM
tests need to have the serial console as primary console.
So in order to support both, I added a new virtualisation.qemu.consoles
option, which allows to specify those options using the module system.
The default of this option is to use the changes that were introduced
and in test-instrumentation.nix we use only the serial console the same
way as before.
For test-instrumentation.nix I didn't add a baudrate to the serial
console because I can't find a reason on top of my head why it should
need it. There also wasn't a reason stated when that was introduced in
7499e4a5b9.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @flokli, @dezgeg, @edolstra
Always enable both tty and serial console, but set preferred console
depending on cfg.graphical.
Even in qemu graphical mode, you can switch to the serial console via
Ctrl+Alt+3.
With that being done, you also don't need to specify
`systemd.services."serial-getty@ttyS0".enable = true;` either as described in
https://nixos.wiki/wiki/Cheatsheet#Building_a_service_as_a_VM_.28for_testing.29,
as systemd automatically spawns a getty on consoles passwd via cmdline.
This also means, vms built by 'nixos-rebuild build-vm' can simply be run
properly in nographic mode by appending `-nographic` to `result/bin/run-*-vm`,
without the need to explicitly add platform-specific QEMU_KERNEL_PARAMS.
The ability to specify "-drive if=scsi" has been removed in QEMU version
2.12 (introduced in 3e3b39f173).
Quote from https://wiki.qemu.org/ChangeLog/2.12#Incompatible_changes:
> The deprecated way of configuring SCSI devices with "-drive if=scsi"
> on x86 has been removed. Use an appropriate SCSI controller together
> "-device scsi-hd" or "-device scsi-cd" and a corresponding "-blockdev"
> parameter instead.
So whenever the diskInterface is "scsi" we use the new way to specify
the drive and fall back to the deprecated way for the time being. The
reason why I'm not using the new way for "virtio" and "ide" as well is
because there is no simple generic way anymore to specify these.
This also turns the type of the virtualisation.qemu.diskInterface option
to be an enum, so the user knows which values are allowed but we can
also make sure the right value is provided to prevent typos.
I've tested this against a few non-disk-related NixOS VM tests but also
the installer.grub1 test (because it uses "ide" as its drive interface),
the installer.simple test (just to be sure it still works with
"virtio") and all the tests in nixos/tests/boot.nix.
In order to be able to run the grub1 test I had to go back to
8b1cf100cd (which is a known commit where
that test still works) and apply the QEMU update and this very commit,
because right now the test is broken.
Apart from the tests here in nixpkgs, I also ran another[1] test in
another repository which uses the "scsi" disk interface as well (in
comparison to most of the installer tests, this one actually failed
prior to this commit).
All of them now succeed.
[1]: 9b5a119972/tests/system/kernel/bfq.nix
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edostra, @grahamc, @dezgeg, @abbradar, @ts468
mke2fs has this annoying property that it uses getrandom() to get random
numbers (for whatever purposes) which blocks until the kernel's secure
RNG has sufficient entropy, which it usually doesn't in the early boot
(except if your CPU supports RDRAND) where we may need to create the
root disk.
So let's give the VM a virtio RNG to avoid the boot getting stuck at
mke2fs.
Allow out of band communication between qemu VMs and the host.
Useful to retrieve IPs of VMs from the host (for instance when libvirt can't analyze
DHCP requests because VMs are configured with static addresses or when
there is connectivity default).
Following legacy packing conventions, `isArm` was defined just for
32-bit ARM instruction set. This is confusing to non packagers though,
because Aarch64 is an ARM instruction set.
The official ARM overview for ARMv8[1] is surprisingly not confusing,
given the overall state of affairs for ARM naming conventions, and
offers us a solution. It divides the nomenclature into three levels:
```
ISA: ARMv8 {-A, -R, -M}
/ \
Mode: Aarch32 Aarch64
| / \
Encoding: A64 A32 T32
```
At the top is the overall v8 instruction set archicture. Second are the
two modes, defined by bitwidth but differing in other semantics too, and
buttom are the encodings, (hopefully?) isomorphic if they encode the
same mode.
The 32 bit encodings are mostly backwards compatible with previous
non-Thumb and Thumb encodings, and if so we can pun the mode names to
instead mean "sets of compatable or isomorphic encodings", and then
voilà we have nice names for 32-bit and 64-bit arm instruction sets
which do not use the word ARM so as to not confused either laymen or
experienced ARM packages.
[1]: https://developer.arm.com/products/architecture/a-profile
- `localSystem` is added, it strictly supercedes system
- `crossSystem`'s description mentions `localSystem` (and vice versa).
- No more weird special casing I don't even understand
TEMP
Resolved the following conflicts (by carefully applying patches from the both
branches since the fork point):
pkgs/development/libraries/epoxy/default.nix
pkgs/development/libraries/gtk+/3.x.nix
pkgs/development/python-modules/asgiref/default.nix
pkgs/development/python-modules/daphne/default.nix
pkgs/os-specific/linux/systemd/default.nix
- Add a new parameter `imageType` that can specify either "efi" or
"legacy" (the default which should see no change in behaviour by
this patch).
- EFI images get a GPT partition table (instead of msdos) with a
mandatory ESP partition (so we add an assert that `partitioned`
is true).
- Use the partx tool from util-linux to determine exact start + size
of the root partition. This is required because GPT stores a secondary
partition table at the end of the disk, so we can't just have
mkfs.ext4 create the filesystem until the end of the disk.
- (Unrelated to any EFI changes) Since we're depending on the
`-E offset=X` option to mkfs which is only supported by e2fsprogs,
disallow any attempts of creating partitioned disk images where
the root filesystem is not ext4.
Currently libvirt requires two qemu derivations: qemu and qemu_kvm which is just a truncated version of qemu (defined as qemu.override { hostCpuOnly = true; }).
This patch exposes an option virtualisation.libvirtd.qemuPackage which allows to choose which package to use:
* pkgs.qemu_kvm if all your guests have the same CPU as host, or
* pkgs.qemu which allows to emulate alien architectures (for example ARMV7L on X86_64), or
* a custom derivation
virtualisation.libvirtd.enableKVM option is vague and could be deprecate in favor of virtualisation.libvirtd.qemuPackage, anyway it does allow to enable/disable kvm.