borgmatic ignores failure of underlying borg (at least in this circumstance) #787
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#787
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I'm trying to run a plain simple 'borgmatic create ...' command to backup a directory (
/Linkstation/Praxis
below) which is mounted via autofs from a NAS.When the borgmatic command is started, in some cases the directory is not yet mount, so the backup process fails and raises an error message like:
/Linkstation/Praxis: file inode changed (race condition), skipping file
.After some investigation I found that this error message is printed/generated by underlying
borg
itself.The problem is that borgmatic does not report this as an error (on_error hooks are not are not triggered, return code is 0).
In constrast, if I run the
borg create
command directly, it fails with the same message below, but stops with a return code of 1, which clearly indicates that an error has occurred.It is fine to me that this error is raised. This can be handled easily. But I would expect to have borgmatic recognize this error and report a failure so that this problem is not unseen.
Steps to reproduce
source_directories:
- /Linkstation/Praxis
one_file_system: true
repositories:
- path: ssh://xxx0@xxx0.repo.borgbase.com/./repo
on_error:
- echo "An error has occurred"
Actual behavior
$ borgmatic create --repository ssh://xxx0@xxx0.repo.borgbase.com/./repo
/Linkstation/Praxis: file inode changed (race condition), skipping file
echo
?0 <------------------------------------ BAD
$ borg create ssh://xxx0@xxx0.repo.borgbase.com/./repo::archive1 /Linkstation/Praxis
/Linkstation/Praxis: file inode changed (race condition), skipping file
echo
?1 <------------------------------------ GOOD
Expected behavior
$ borgmatic create --repository ssh://xxx0@xxx0.repo.borgbase.com/./repo
/Linkstation/Praxis: file inode changed (race condition), skipping file
An error has occurred
echo
?1
Other notes / implementation ideas
No response
borgmatic version
1.8.4
borgmatic installation method
pip install
Borg version
1.2.6
Python version
3.11
Database version (if applicable)
No response
Operating system and version
Raspbian GNU/Linux 10 (buster)
Thanks for taking the time to file this and provide all the details! What's going on here is that Borg is issuing a warning rather than an error, which borgmatic interprets and such and therefore does not halt execution or log an error message. It does however log a warning message.
As a work-around, consider performing a check in
before_backup
orbefore_actions
to make sure your source directory is actually mounted successfully. Alternatively, if you believe the Borg warning you're receiving should really be an error, you might consider filing a ticket with Borg on that topic. Although be aware that the Borg project generally takes the stance that you should be looking through your Borg logs for warnings.I think the behavior of borg is completely fine, it detects some problems with the underlying filesystem, writes a warning message and decides that it is a good idea to inform the user by returning an exit code greater than 0. This is perfect for automated backups because this is usually what you are looking for to detect issue one needs to take care of.
On the other hand, borgmatic also seems to see the warning message, but decides not to return an exit code. And indeed, this would force the user to permanently parse log files to find possible problems.
I don't think this is an ideal scenario, especially because in my case borg did not backup anything for many weeks before I realized the problem with the NFS mount. This really imposes a risk of losing data, if nothing gets written into the backups.
Is there a reason why borgmatic doesn't want to inform the user about warnings, the same way that borg does?
On the other hand modifying borgmatic now to return non-zero exit codes on warnings would probably break a lot of installations of borgmatic.
One solution could be to introduce something like
on_warning
-hooks, and also allow monitoring hooks like ntfy to send out warnings messages (by adding another handler).Whatever the solution is, those kind of warnings should IMHO not be hidden in logfiles.
borgmatic has historically indicated warnings to the user via logging warnings rather than altering borgmatic's exit code. Part of the rationale is that borgmatic is often run via cron (or similar job runners), and with cron an exit code of 1 is used to indicate an error that needs attention rather than a warning. This is also made more complicated by the fact that Borg issues warnings for a large number of conditions, including "minor" things like source files changing while they're being read.
So that's the background. I will say that borgmatic can and does make breaking changes, for instance along with bigger version number bumps, so it's not out of the question to change its behavior with regards to exit codes. But I think the challenge here, as suggested above, is that not all Borg "warnings" are created equal. Sometimes the user cares about them such as in your case, and other times the user really doesn't want to be bothered. So unconditionally turning all Borg warnings into a borgmatic exit code 1 might be pretty annoying for many users.
I think the ideal would be for borgmatic users to be able to pick and choose which warnings they care about, but Borg doesn't currently make that information available programmatically AFAIK.
Your
on_warning
/ monitoring hook idea is interesting, although I'm not sure that entirely solves it either, because it would presumably suffer from the same "I only care about certain warnings" problem.So that brings me back to the
before_backup
approach. Rather than relying on Borg to tell you that your backup is useless via a "warning," maybe you could write abefore_backup
script that ensures the source filesystem is actually mounted...Something like this:
... which you could even combine with borgmatic retry logic so that it won't give up right away. For instance:
Anyway, let me know your thoughts.
Related ticket: #798.
I'm closing this for now, but I'd be happy to continue the discussion and/or reopen if necessary. Thanks!