Post backup command returns non-zero exit status 255 #871
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#871
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Hi all,
I use borgmatic for my backup routine to a remote server using SSH. It's been almost 4 years, great software.
When the backup ends, borgmatic executes this command to shut down the remote server:
Note: elements with [] have been sanitized
This used to work perfectly, but since a while this command is returning the following:
[2024-05-19 14:00:34,803] WARNING: Command 'ssh [username]@[serverdomain] -i /etc/ssh/[sshkey] -p [port] 'sudo /sbin/poweroff'' returned non-zero exit status 255.
The dommand WORKS and it shuts down the remote system as expected.
If I run the command manually in a shell, I don't get any exit status, it just executes, then I get the prompt for a new command.
Am I missing something here?
This breaks the borgmatic logic, first thinking that everything went well (backup ended), then an "error" occurs because it thinks this command din't execute correctly.
The remote server is a Synology NAS running DSM 7.
Steps to reproduce
Actual behavior
[2024-05-19 14:00:00,582] INFO: /etc/borgmatic/config.yaml: Running 3 commands for post-backup hook
[2024-05-19 14:00:00,602] WARNING: Backup tasks have been completed and the remote server will be shut down.
[2024-05-19 14:00:34,802] WARNING: ssh://[username]@[serverdomain]/./backup: Error running actions for repository
[2024-05-19 14:00:34,803] WARNING: Command 'ssh [username]@[serverdomain] -i /etc/ssh/[sshkey] -p [port] 'sudo /sbin/poweroff'' returned non-zero exit status 255.
Expected behavior
Previous behaviour:
[2023-04-22 14:19:33,642] INFO: /etc/borgmatic/config.yaml: Running 2 commands for post-everything hook
[2023-04-22 14:19:33,651] WARNING: Tasks have been completed and the remote server will be shut down.
[2023-04-22 14:19:36,123] INFO:
[2023-04-22 14:19:36,124] INFO: summary:
[2023-04-22 14:19:33,641] INFO: /etc/borgmatic/config.yaml: Successfully ran configuration file
Other notes / implementation ideas
No response
borgmatic version
1.7.4
borgmatic installation method
bullseye-backports
Borg version
1.2.3
Python version
Python 3.9.2
Database version (if applicable)
N/A
Operating system and version
Debian bullseye (OpenMediaVault 6)
I'm not exactly sure what's going on here, but is it possible that the remote machine shuts down the network before the SSH session completes, thereby resulting in the SSH error you're experiencing? That might be consistent with exit code 255.
What happens if you then run
echo $?
in that same shell right after? Does it show an error exit code of 255?Do you know if this changed from working to not working after a borgmatic upgrade? Or perhaps instead after an upgrade to your NAS?
If you decide that this is a "spurious" error that you'd prefer to suppress, you could add something like
|| true
to your SSH command. However this would suppress all errors with that command.Sorry for the late reply, I didn't get the email.
I think this behaviour has changed after a borgmatic update.
I was running a pretty old version of borgmatic (1.5.x) from the regular bullseye (Debian 11). I switched to the bullseye-backports which installed a fairly recent 1.7.4.
Interesting, shows 130, not 255.
I can't immediately think of anything that would've changed in that upgrade, but it's quite possible that the exit status handling or the precise timing in handling sub-commands changed between those versions. Although note that even 1.7.4 is pretty old at this point.. It's from 2022!
I think exit code 130 is SIGINT, which means the command is getting interrupted—possibly by the system shutting down. So I'm not sure about the 130/255 discrepancy but I think borgmatic is working "as intended" here. If it receives a non-zero exit status, it complains. You could try the
|| true
trick if you're comfortable with all errors from that command getting swallowed.Hi @witten I tried the
|| true
workaround and is working as intended. I'm fine with that, but would be nice to manage these exit codes and tell borgmatic they are fine "natively".That's how Debian works... I might try to force the upgrade but I'm scared of the dependencies.
There is actually an existing feature to configure how borgmatic interprets Borg's exit codes as success or failure, but that doesn't apply to other commands like those in
after_backup
. And I'm not sure how that would even work given that you can run any number of arbitrary commands (with different exit code semantics) in each command hook. But given that you can put arbitrary scripting within those hooks, I would suggest calling a shell script that does the exit code interpretation you want. For instance, in Bash you can check the value of$?
after running a command likessh
and, based on the specific value(s), either exit with a success or error.You could use pipx to manage the installation and dependencies, which might make things easier. Although I understand if you'd rather just stick with the Debian package.
Thank you for your recommendations.
I will create a little script to make the shutdown command better and avoid the
|| true
workaround, but most importantly I have switched to a pipx installation, thanks for telling me!Since I updated borgmatic I was seeing strange things in the syslog, so I decided that was a great moment to uninstall it and go for pipx. If this issue will appear again I will open a new case.
Thanks again!
Glad to hear you've got a newer version of borgmatic!