systemd service fails with status 1/FAILURE #376
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#376
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I run borgmatic via systemd and the borgmatic.timer and borgmatic.service. Since yesterday (04-12-2020) the service does exit with status 1/FAILURE. Manually running borgmatic works.
Steps to reproduce (if a bug)
Here is one of my config files.
Here is the borgmatic.service
and the borgmatic.timer
Actual behavior (if a bug)
Here is the output of journalctl -xu borgmatic
Expected behavior (if a bug)
The service runs without error.
Other notes / implementation ideas
Environment
borgmatic version: 1.5.12
borgmatic installation method: Pacman package
Borg version: 1.1.14
Python version: 3.9.0
Database version (if applicable): ---
operating system and version: Endeavour OS, kernel version 5.9.11-arch2-1
I may have found the reason. I checked the updated sample service file and added
to my service file and it worked again. What is the exact reason for this line?
Thank you for reporting this, and for including so many details!
Here's what I know so far. It looks like systemd version 247 recently made some sort of breaking change such that the example borgmatic systemd service file no longer works, resulting in the exact behavior you're seeing. systemd 246 worked great. systemd 247, not so much. (It'd be good to confirm that that's the version of systemd you're on, e.g.
systemctl --version
.)I strongly suspect that the breaking change has to do with the set of system calls borgmatic is allowed to make, although I have not confirmed this. I don't have a local repro of the problem here—I'm on systemd 246 still.
The recent
SystemCallErrorNumber=EPERM
addition was an attempt to change how systemd deals with insufficient permissions when a system call is blocked due to permissions. Specifically, instead of just unceremoneously killing the borgmatic process (what you're seeing), systemd instead returns a permission error (EPERM
) from the system call. Which gives borgmatic the opportunity to handle the error more gracefully.Here's what I don't yet know: Which system call is failing, and how does borgmatic handle it gracefully? There is, IIRC, a single place in the code where a specific
PermissionError
is handled, and that's it. I suppose it's also possible that one of borgmatic's dependencies is handling thePermissionError
internally.So that's where we are.
SystemCallErrorNumber=EPERM
seems to "make it work", which isn't really satisfying because it's not clear how or why.Thanks to
dmc
for diagnosing much of this on IRC.Here is my version of systemd:
And here is the relevant part of the output from
grep -i upgraded /var/log/pacman.log
:So yes, since the upgrade of systemd the service stopped working.
Okay, at least it's consistent then. So my current recommendation is to keep
SystemCallErrorNumber=EPERM
in your systemd service configuration, and holler here if you experience anyPermissionDenied
or similar issues with borgmatic. I'll try to get access to a systemd 247 machine. (Or just wait.)Thanks a lot @csteinforth, I couldn't figure out why the service wasn't working, but your solution solved my problem.
I've confirmed/repro'd this on my own systemd 247 machine. Next step: See if the
SystemCallErrorNumber=EPERM
"fixes" it.So far
SystemCallErrorNumber=EPERM
worked for me.I have the same issue (posted an issue a few days ago about it). SystemCallErrorNumber=EPERM is in my config but it's still not working. No issues running it in terminal, it only fails when run with the systemd timer.
Calling this working after six months of use with
SystemCallErrorNumber=EPERM
. 😄