Commit graph

321 commits

Author SHA1 Message Date
K900 cf105c3841
nixos/test-driver: Crash on closed socket during connect() (#443539) 2025-09-16 22:05:37 +03:00
Will Fancher 90e7c6f90c nixos/test-driver: Crash on closed socket during connect()
If a VM crashes during `connect()`, e.g. because of `panic_on_fail`
during initrd, we would spin on the closed socket forever. We should
rasie an exception instead.
2025-09-16 14:35:26 -04:00
Robert Hensing 6d521ceac9 nixos/test-driver: Add machine.get_console_log() 2025-09-12 16:46:03 +02:00
Maximilian Bosch 9b1025c7fd
nixos/test-driver: always respect --dump-vsock
Before, the vsocks the test-run will use were only printed in
interactive mode. However, with `enableDebugHook = true;` this
functionality can also be used via `breakpointHook` within the build
sandbox, i.e. in the non-interactive mode.

I misremembered how much effort this is to fix, otherwise, I would've
added this to #422066 already ;)
2025-08-10 18:25:57 +02:00
Jacek Galowicz 6fe959f8ce nixos test driver: drop wrong assertion 2025-08-07 11:59:39 +02:00
Wolfgang Walther 5a0711127c
treewide: run nixfmt 1.0.0 2025-07-24 13:55:40 +02:00
Jacek Galowicz d6b326d659
test-driver: Implement debugging breakpoint hooks
Co-authored-by: Maximilian Bosch <maximilian@mbosch.me>
2025-07-18 17:39:01 +02:00
Wolfgang Walther f965d36fa1
nixos/lib/test-driver: correct regex in pythonize_name() (#423064) 2025-07-09 16:01:27 +00:00
Michael Daniels 98b97a2011
nixos/lib/test-driver: correct regex in pythonize_name()
r"[A-z]" is not equivalent to r"[A-Za-z]"; it is equivalent to r"[A-Z[]^_`a-z]".
But Python variable names cannot contain, e.g., a backtick.
So the current regex is wrong.
2025-07-06 21:09:36 -04:00
Yarny0 5e2baf54d4 nixos/test-driver: fix race from filename clash in OCR
There is a race condition
in the new paralleized OCR code.
The race condition got "active" in commit
819d304a39 (Use futures for OCR parallelization),
however, the underlying bug already slipped in with commit
e6ea13f4ea (User proper `Path` instead of `str` in OCR code).

The OCR module applies tesseract to at most three variants
of the screenshot: the original one, and two variants that
are created by a preprocessing step (with ImageMagick).
The preprocessing step needs an output filename
that is used to write the preprocessed image file.

The "Path" commit broke the way the output file is named:
The code still attempts to append a ".negative" to *one*
of the preprocessed output files, but the method
`.with_suffix` is not suitable for that purpose:
Lateron, ".png" is also added with `.with_suffix`,
*replacing* the ".negative" and thereby yielding the
*the same* output filename for both preprocessed files.

Without parallelization, this doesn't hurt;
preprocessed files are simply created and analyzed in order.
But the parallelization commit
causes that these two tasks now run in parallel
(plus the third task that analyses the original screensshot,
but that does not cause any further harm here):

* Task 1: preprocess (non-negative), then tesseract the output
* Task 2: preprocess (negative), then tesseract the output

Both tasks use the same filename and thus the same file for the
preprocessed image that is generated, then used by tesseract.
This often creates a garbage file since both
preprocessings write that one file at the same time.
Tesseract consequently fails and
complains about bad data in its input file.

The commit at hand simply fixes the file naming
by adding ".negative.png" or ".positive.png"
to the filename for the preprocessed image.
This ensures both threads no longer hurt each
other's data and can now coexist in peace.
2025-07-04 12:10:53 +02:00
Jacek Galowicz 3a670480d1
nixos/lib/test-driver: try using XDG_RUNTIME_DIR if available (#414231) 2025-07-02 16:11:44 +02:00
Jacek Galowicz 26bcb57f3c test-driver: fix number of cores 2025-07-02 13:59:15 +00:00
Maximilian Bosch 59b4d0de90
nixos/lib/test-driver: try using XDG_RUNTIME_DIR if available
At work we have the use-case that several people connect to a large
Linux box to run tests and debug those interactively.

All tests write their state into a global `/tmp` -- e.g. the vde1 socket
and the VMs' state. This leads to conflicts when multiple people are
doing this.

This change tries to use XDG_RUNTIME_DIR before using Python's detection
of a global temp directory: when connecting, this requires a working
user session, but then we get working directories per user. This is
preferable over doing something like `mktemp -d` per run since that
would break use-cases where you want to keep the VMs' state across
multiple sessions (`--keep-vm-state`).
2025-07-02 15:53:12 +02:00
Jacek Galowicz 819d304a39 test-driver: Use futures for OCR parallelization 2025-07-02 11:43:13 +00:00
Jacek Galowicz e6ea13f4ea test-driver: User proper Path instead of str in OCR code 2025-07-01 14:18:41 +02:00
Jacek Galowicz f56933ebbf test-driver: drop OCR engine mode variations 2025-07-01 14:18:40 +02:00
Jacek Galowicz 14c01b5af5 test-driver: Parallelize OCR 2025-07-01 14:18:40 +02:00
Jacek Galowicz 9f10c9bce8 test-driver: Factor out OCR related code to machine/ocr.py 2025-07-01 14:18:40 +02:00
Jacek Galowicz 2c8500b91d test-driver: Move machine code into its own python module folder 2025-07-01 14:18:40 +02:00
Maximilian Bosch a9adfc631a
nixos/test-driver: allow assigning other vsock number ranges
I'm a little annoyed at myself that I only realized this _after_ #392030
got merged. But I realized that if something else is using AF_VSOCK or
you simply have another interactive test running (e.g. by another user
on a larger builder), starting up VMs in the driver fails with

    qemu-system-x86_64: -device vhost-vsock-pci,guest-cid=3: vhost-vsock: unable to set guest cid: Address already in use

Multi-user setups are broken anyways because you usually don't have
permissions to remove the VM state from another user and thus starting
the driver fails with

    PermissionError: [Errno 13] Permission denied: PosixPath('/tmp/vm-state-machine')

but this is something you can work around at least.

I was considering to generate random offsets, but that's not feasible
given we need to know the numbers at eval time to inject them into the
QEMU args. Also, while we could do this via the test-driver, we should
also probe if the vsock numbers are unused making the code even more
complex for a use-case I consider rather uncommon.

Hence the solution is to do

    sshBackdoor.vsockOffset = 23542;

when encountering conflicts.
2025-05-09 11:54:00 +02:00
Maximilian Bosch 8869265f93
nixos/test-driver: printout instructions on how to connect via AF_VSOCK 2025-05-08 10:51:39 +02:00
Jacek Galowicz d0c304d4c1
nixos/test-driver: improve error reporting and assertions (#390996) 2025-04-26 10:26:01 +02:00
Maximilian Bosch deff22bcc8
nixos/test-driver: improve wording on comments about new error handling
Co-authored-by: Benoit de Chezelles <bew@users.noreply.github.com>
2025-03-22 19:13:48 +01:00
Maximilian Bosch e2b3517f59
nixos/test-driver: use ipython via ptpython
Closes #180089

I realized that the previous commits relying on `sys.exit` for dealing
with `MachineError`/`RequestedAssertionFailed` exit the interactive
session which is kinda bad.

This patch uses the ipython driver: it seems to have equivalent features
such as auto-completion and doesn't stop on SystemExit being raised.

This also fixes other places where this happened such as things calling
`log.error` on the CompositeLogger.
2025-03-21 12:34:59 +00:00
Maximilian Bosch d587d569e0
nixos/test-driver: restructure error classes
After a discussion with tfc, we agreed that we need a distinction
between errors where the user isn't at fault (e.g. OCR failing - now
called `MachineError`) and errors where the test actually failed (now
called `RequestedAssertionFailed`).

Both get special treatment from the error handler, i.e. a `!!!` prefix
to make it easier to spot visually.

However, only `RequestedAssertionFailed` gets the shortening of the
traceback, `MachineError` exceptions may be something to report and
maintainers usually want to see the full trace.

Suggested-by: Jacek Galowicz <jacek@galowicz.de>
2025-03-21 11:38:01 +00:00
Jacek Galowicz 482beabbbd NixOS Test driver: Display Qemu windows on macOS in interactive mode 2025-03-20 15:40:02 +00:00
Maximilian Bosch cc3d409adc
nixos/test-driver: log associated machine for self.nested
When doing `machine.succeed(...)` or something similar, it's now clear
that the command `...` was issued on `machine`.

Essentially, this results in the following diff in the log:

    -(finished: waiting for unit default.target, in 13.47 seconds)
    +machine: (finished: waiting for unit default.target, in 13.47 seconds)
    (finished: subtest: foobar text lorem ipsum, in 13.47 seconds)
2025-03-20 13:20:51 +00:00
Maximilian Bosch 11ff96a679
nixos/test-driver: use RequestedAssertionFailed/TestScriptError in Machine class
I think it's reasonable to also have this kind of visual distinction
here between test failures and actual errors from the test framework.

A failing `machine.require_unit_state` now lookgs like this for
instance:

    !!! Traceback (most recent call last):
    !!!   File "<string>", line 3, in <module>
    !!!     machine.require_unit_state("postgresql","active")
    !!!
    !!! RequestedAssertionFailed: Expected unit 'postgresql' to to be in state 'active' but it is in state 'inactive'

Co-authored-by: Benoit de Chezelles <bew@users.noreply.github.com>
2025-03-20 13:20:37 +00:00
Maximilian Bosch a1dfaf51e2
nixos/test-driver: integrate Python unittest assertions
Replaces / Closes #345948

I tried to integrate `pytest` assertions because I like the reporting,
but I only managed to get the very basic thing and even that was messing
around a lot with its internals.

The approach in #345948 shifts too much maintenance effort to us, so
it's not really desirable either.

After discussing with Benoit on Ocean Sprint about this, we decided that
it's probably the best compromise to integrate `unittest`: it also
provides good diffs when needed, but the downside is that existing tests
don't benefit from it.

This patch essentially does the following things:

* Add a new global `t` that is an instance of a `unittest.TestCase`
  class. I decided to just go for `t` given that e.g.
  `tester.assertEqual` (or any other longer name) seems quite verbose.

* Use a special class for errors that get special treatment:
  * The traceback is minimized to only include frames from the
    testScript: in this case I don't really care about anything else and
    IMHO that's just visual noise.

    This is not the case for other exceptions since these may indicate a
    bug and then people should be able to send the full traceback to the
    maintainers.
  * Display the error, but with `!!!` as prefix to make sure it's
    easier to spot in between other logs.

This looks e.g. like

    !!! Traceback (most recent call last):
    !!!   File "<string>", line 7, in <module>
    !!!     foo()
    !!!   File "<string>", line 5, in foo
    !!!     t.assertEqual({"foo":[1,2,{"foo":"bar"}]},{"foo":[1,2,{"bar":"foo"}],"bar":[1,2,3,4,"foo"]})
    !!!
    !!! NixOSAssertionError: {'foo': [1, 2, {'foo': 'bar'}]} != {'foo': [1, 2, {'bar': 'foo'}], 'bar': [1, 2, 3, 4, 'foo']}
    !!! - {'foo': [1, 2, {'foo': 'bar'}]}
    !!! + {'bar': [1, 2, 3, 4, 'foo'], 'foo': [1, 2, {'bar': 'foo'}]}
    cleanup
    kill machine (pid 9)
    qemu-system-x86_64: terminating on signal 15 from pid 6 (/nix/store/wz0j2zi02rvnjiz37nn28h3gfdq61svz-python3-3.12.9/bin/python3.12)
    kill vlan (pid 7)
    (finished: cleanup, in 0.00 seconds)

Co-authored-by: bew <bew@users.noreply.github.com>
2025-03-20 12:30:58 +00:00
K900 58edd1e2ac
nixos/docs: fix typo (#372394) 2025-02-10 13:47:45 +03:00
OPNA2608 fa984fd7aa nixos/lib/test-driver: Revert magick args order
...as it apparently matters when we do the -negate
2025-01-22 14:59:35 +01:00
K900 5b434ed807 nixos/lib/test-driver: try more OCR options
The current setup is really weird and definitely wrong for many cases
because it inverts the colors of the image, which is never a good idea
for GUIs. So, try to OCR three different times: once on the source image,
once with processing, and once with processing but no negation.

This should hopefully make things work at least somewhat better for GUIs.
2025-01-21 14:16:04 +03:00
K900 4b5b5d19d2
nixos/test-driver: fix OCR (#375091) 2025-01-19 21:01:53 +03:00
Victor Engmark 8f2bc9842e
nixos/test-driver: Use consistent naming and types
Specifies the "last try" parameter in all methods called by `retry`.
Doing this clarifies its presence, and makes it easier to use it in the
future if needed.
2025-01-19 17:59:13 +01:00
K900 84b216c2a6 nixos/test-driver: fix OCR
I don't know why it doesn't want to do TIFF now, but there's also
absolutely no reason for it to be TIFF anyway, so let's just use
an image format that is actually sane.
2025-01-19 18:41:58 +03:00
Gaetan Lepage 8711bcf71a nixos-test-driver: reformat with latest ruff 2025-01-09 15:43:10 +01:00
Anton Mosich 9d2d70bea2
nixos/docs: fix typo
If that string wasn't a raw string, the "\n" in the second line won't
get rendered as such, but as a space instead.
2025-01-09 15:42:14 +01:00
Emily f6ce575a03 nixos/test-driver: avoid lib.fileset 2024-12-31 02:30:18 +00:00
Silvan Mosberger 4f0dadbf38 treewide: format all inactive Nix files
After final improvements to the official formatter implementation,
this commit now performs the first treewide reformat of Nix files using it.
This is part of the implementation of RFC 166.

Only "inactive" files are reformatted, meaning only files that
aren't being touched by any PR with activity in the past 2 months.
This is to avoid conflicts for PRs that might soon be merged.
Later we can do a full treewide reformat to get the rest,
which should not cause as many conflicts.

A CI check has already been running for some time to ensure that new and
already-formatted files are formatted, so the files being reformatted here
should also stay formatted.

This commit was automatically created and can be verified using

    nix-build a08b3a4d19.tar.gz \
      --argstr baseRev b32a094368
    result/bin/apply-formatting $NIXPKGS_PATH
2024-12-10 20:26:33 +01:00
Emily 8221c09ff5
nixos/lib/test-driver: fix linting after compatibility clean‐up
The previous commit removed the handling of `dict` arguments, but
didn’t adjust the type, leading to the following type-checking error:

    test_driver/driver.py:216: error: Argument 1 to "NixStartScript" has incompatible type "str | dict[Any, Any]"; expected "str"  [arg-type]

It also left an unused import that Ruff is unhappy about:

    build/lib/test_driver/driver.py:11:22: F401 [*] `colorama.Fore` imported but unused
    …
    build/lib/test_driver/driver.py:11:28: F401 [*] `colorama.Style` imported but unused

Fixes: 71306e6b36
(cherry picked from commit d490680530)
(cherry picked from commit ff31b814b6)
2024-11-30 15:11:39 +01:00
Wolfgang Walther a92ea1ff26
nixos/lib/test-driver: remove legacy args handling
Scheduled for removal in 24.11, so let's follow through.

Added in #291544.

(cherry picked from commit 71306e6b36)
(cherry picked from commit 8427b6f640)
2024-11-30 15:11:38 +01:00
Nick Cao 172a35f8ce
nixos/test-driver: target python 3.12 2024-11-22 10:49:32 -05:00
Nick Cao e23f1733c6
nixos/test-driver: use ruff format in place of black 2024-11-22 10:49:31 -05:00
Nick Cao ef2d3c542a
nixos/test-driver: modernize 2024-11-22 10:49:31 -05:00
Nick Cao 42d4046e94
nixos/test-driver: format with nixfmt 2024-11-22 10:49:30 -05:00
Nick Cao b25360a7e5
nixos/test-driver: apply ruff check suggestions 2024-11-22 10:49:30 -05:00
Jörg Thalheim ef9502a009 nixos/test-driver: fix resource cleanup of vlan/qmp objects
Using __del__ is somewhat unsound resource cleanup in our clase the
logger already closed its logfile and therefor fails with exception
before the rest of the resources can be cleaned up.
2024-10-16 19:46:38 +03:00
Philip Taron 2b67819d55 nixos-test-driver: avoid top-level with in shell.nix 2024-07-29 19:46:18 +02:00
Silvan Mosberger b3ad661e9f nixos/lib/test-driver: Prevent unnecessary rebuilds
E.g. when only Nix files change
2024-06-14 20:42:16 +02:00
Martin Weinelt ab897a8c62
nixos/test-driver: fix return value of subtest function
Mypy since version 1.10.0 complains about this:

> test_driver/driver.py:109: error: No return value expected  [return-value]
2024-06-06 01:07:39 +02:00