The module and nixos test are currently broken because the logPath option is always set by default and it passes a parameter to the CLI that no longer exists. Lets just remove logPath all together as the parameter it relied on got removed.
...including a slightly more careful config around restarts, i.e.
* We have intervals of 5 seconds between restarts instead of 100ms.
* If we exceed 5 start attempts in 5*120s (with 120s being the timeout),
start job gets rate-limited and thus aborted. Do note that there are
at most 5 start attempts allowed in ~625s by default. If the startup
fails very quickly, either wait until the rate-limit is over or reset
the counter using `systemctl reset-failed postgresql.service`.
* The interval of 625s (plus 5s of buffer) are automatically derived
from RestartSec & TimeoutSec. Changing either will also affect
StartLimitIntervalSec unless overridden with `mkForce`.
At my employer's NixOS-based platform, PostgreSQL is configured with
`Restart=always` which got never upstreamed, unfortunately.
This however revealed an interesting problem when using bi-directional
BindsTo: when killing `postgresql.service`, sometimes both the service &
target starts back up and sometimes they don't. According to an upstream
bugreport[1] this is a known problem because you have two conflicting
operations scheduled in a single transaction, namely
* When (auto-)restarting, a restart job for all units bound to the
restarting unit are immediately scheduled[2].
* Due to the `BindsTo` relationship, a stop-job for `postgresql.target`
is scheduled immediately by the manager loop[3]. This is caused by the
`UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT` "atom" which is ONLY set for a
BindsTo relationship[4].
When this is processed first, the restart is inhibited:
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Main process exited, code=killed, status=9/KILL
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Changed running -> stop-sigterm
Jul 12 13:25:51 nixos systemd[1]: postgresql.target: Trying to enqueue job postgresql.target/stop/replace
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Installed new job postgresql.service/stop as 80053
Jul 12 13:25:51 nixos systemd[1]: postgresql.target: Installed new job postgresql.target/stop as 80052
Jul 12 13:25:51 nixos systemd[1]: postgresql.target: Enqueued job postgresql.target/stop as 80052
[...]
Jul 12 13:25:51 nixos systemd[1]: postgresql.service: Service restart not allowed.
It's subtle and non-obvious from the man-page, but the way how units are
stopped is different when using `PartOf=` or `Requires=` which don't have the
`UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT` property, but instead schedules the
stop/start of the target AFTER the stop-job of postgresql.service which
is turned into a start-job because of Restart=always:
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Main process exited, code=killed, status=9/KILL
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Failed with result 'signal'.
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Service will restart (restart setting)
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.target: Installed new job postgresql.target/restart as 80996
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Installed new job postgresql.service/restart as 80907
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Scheduled restart job, restart counter is at 1.
[...]
Jul 12 13:33:00 nixos systemd[1]: Stopped target postgresql.target.
Jul 12 13:33:00 nixos systemd[1]: postgresql.target: Converting job postgresql.target/restart -> postgresql.target/start
Jul 12 13:33:00 nixos systemd[1]: Stopping postgresql.target...
[...]
Jul 12 13:33:00 nixos systemd[1]: Stopped postgresql.service.
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Converting job postgresql.service/restart -> postgresql.service/start
[...]
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Changed dead -> running
Jul 12 13:33:00 nixos systemd[1]: postgresql.service: Job 80907 postgresql.service/start finished, result=done
Jul 12 13:33:00 nixos systemd[1]: Started postgresql.service.
Jul 12 13:33:00 nixos systemd[1]: postgresql.target: Changed dead -> active
[...]
Jul 12 13:33:00 nixos systemd[1]: Reached target postgresql.target.
Do note that the stop job (including the restart) of postgresql.service
is fully processed here before dealing with PartOf/ConsistsOf
relationships.
I tested this against the following cases:
| Unit | Action | Propagates to |
| ------------------ | ------------ | ------------------ |
| postgresql.target | restart | postgresql.service |
| postgresql.target | start | postgresql.service |
| postgresql.target | stop | psotgresql.service |
| postgresql.service | start | postgresql.target |
| postgresql.service | restart | postgresql.target |
| postgresql.service | stop | postgresql.target |
| postgresql.service | auto-restart | postgresql.target |
| postgresql.service | failure | postgresql.target |
[1] e.g. systemd issue 8374
[2] https://github.com/systemd/systemd/blob/v256-stable/src/core/service.c#L2535-L2542
[3] https://github.com/systemd/systemd/blob/v256-stable/src/core/manager.c#L1611-L1626
[4] https://github.com/systemd/systemd/blob/v256-stable/src/core/unit-dependency-atom.c#L30-L35
See https://discourse.nixos.org/t/i-cannot-for-the-life-of-me-find-the-package-that-has-pg-config/66244/4
I decided against doing this in its own nixpkgs manual: the line
to draw is quite blurry already (e.g. we have documented our package
removal policy in here as well) and having to check two manuals for a
single subsystem feels pretty annoying to me.
The relevant part - where to find pg_config - is written at the top. I
decided to give a bit more context about the way our packaging works
since I realized a few times now that I don't remember all the details
about the problems we had in the past and having to look up individual
commit messages for that isn't very productive.
The new postgresql.target will now wait until recovery is done and
read/write connections are possible.
This allows ensure* scripts and downstream migrations to work properly
after recovery from backup.
Resolves#346886
This avoids restarting the postgresql server, when only ensureDatabases
or ensureUsers have been changed. It will also allow to properly wait
for recovery to finish later.
To wait for "postgresql is ready" in other services, we now provide a
postgresql.target.
Resolves#400018
Co-authored-by: Marcel <me@m4rc3l.de>