borgmatic ignores failure of underlying borg (at least in this circumstance) #787

Closed
opened 2023-11-11 12:46:27 +00:00 by ralhei · 5 comments
Contributor

What I'm trying to do and why

I'm trying to run a plain simple 'borgmatic create ...' command to backup a directory (/Linkstation/Praxis below) which is mounted via autofs from a NAS.
When the borgmatic command is started, in some cases the directory is not yet mount, so the backup process fails and raises an error message like:
/Linkstation/Praxis: file inode changed (race condition), skipping file.
After some investigation I found that this error message is printed/generated by underlying borg itself.

The problem is that borgmatic does not report this as an error (on_error hooks are not are not triggered, return code is 0).
In constrast, if I run the borg create command directly, it fails with the same message below, but stops with a return code of 1, which clearly indicates that an error has occurred.

It is fine to me that this error is raised. This can be handled easily. But I would expect to have borgmatic recognize this error and report a failure so that this problem is not unseen.

Steps to reproduce

source_directories:
- /Linkstation/Praxis

one_file_system: true

repositories:
- path: ssh://xxx0@xxx0.repo.borgbase.com/./repo

on_error:
- echo "An error has occurred"

Actual behavior

$ borgmatic create --repository ssh://xxx0@xxx0.repo.borgbase.com/./repo
/Linkstation/Praxis: file inode changed (race condition), skipping file
echo?
0 <------------------------------------ BAD

$ borg create ssh://xxx0@xxx0.repo.borgbase.com/./repo::archive1 /Linkstation/Praxis
/Linkstation/Praxis: file inode changed (race condition), skipping file
echo?
1 <------------------------------------ GOOD

Expected behavior

$ borgmatic create --repository ssh://xxx0@xxx0.repo.borgbase.com/./repo
/Linkstation/Praxis: file inode changed (race condition), skipping file
An error has occurred
echo?
1

Other notes / implementation ideas

No response

borgmatic version

1.8.4

borgmatic installation method

pip install

Borg version

1.2.6

Python version

3.11

Database version (if applicable)

No response

Operating system and version

Raspbian GNU/Linux 10 (buster)

### What I'm trying to do and why I'm trying to run a plain simple 'borgmatic create ...' command to backup a directory (`/Linkstation/Praxis` below) which is mounted via autofs from a NAS. When the borgmatic command is started, in some cases the directory is not yet mount, so the backup process fails and raises an error message like: `/Linkstation/Praxis: file inode changed (race condition), skipping file`. After some investigation I found that this error message is printed/generated by underlying `borg` itself. The problem is that borgmatic does not report this as an error (on_error hooks are not are not triggered, return code is 0). In constrast, if I run the `borg create` command directly, it fails with the same message below, but stops with a return code of 1, which clearly indicates that an error has occurred. It is fine to me that this error is raised. This can be handled easily. But I would expect to have borgmatic recognize this error and report a failure so that this problem is not unseen. ### Steps to reproduce source_directories: - /Linkstation/Praxis one_file_system: true repositories: - path: ssh://xxx0@xxx0.repo.borgbase.com/./repo on_error: - echo "An error has occurred" ### Actual behavior $ borgmatic create --repository ssh://xxx0@xxx0.repo.borgbase.com/./repo /Linkstation/Praxis: file inode changed (race condition), skipping file $ echo $? 0 <------------------------------------ BAD $ borg create ssh://xxx0@xxx0.repo.borgbase.com/./repo::archive1 /Linkstation/Praxis /Linkstation/Praxis: file inode changed (race condition), skipping file $ echo $? 1 <------------------------------------ GOOD ### Expected behavior $ borgmatic create --repository ssh://xxx0@xxx0.repo.borgbase.com/./repo /Linkstation/Praxis: file inode changed (race condition), skipping file An error has occurred $ echo $? 1 ### Other notes / implementation ideas _No response_ ### borgmatic version 1.8.4 ### borgmatic installation method pip install ### Borg version 1.2.6 ### Python version 3.11 ### Database version (if applicable) _No response_ ### Operating system and version Raspbian GNU/Linux 10 (buster)
Owner

Thanks for taking the time to file this and provide all the details! What's going on here is that Borg is issuing a warning rather than an error, which borgmatic interprets and such and therefore does not halt execution or log an error message. It does however log a warning message.

As a work-around, consider performing a check in before_backup or before_actions to make sure your source directory is actually mounted successfully. Alternatively, if you believe the Borg warning you're receiving should really be an error, you might consider filing a ticket with Borg on that topic. Although be aware that the Borg project generally takes the stance that you should be looking through your Borg logs for warnings.

Thanks for taking the time to file this and provide all the details! What's going on here is that Borg is issuing a [warning rather than an error](https://borgbackup.readthedocs.io/en/stable/usage/general.html?highlight=warning#return-codes), which borgmatic interprets and such and therefore does not halt execution or log an error message. It does however log a warning message. As a work-around, consider performing a check in `before_backup` or `before_actions` to make sure your source directory is actually mounted successfully. Alternatively, if you believe the Borg warning you're receiving should really be an error, you might consider filing a ticket with Borg on that topic. Although be aware that the Borg project generally takes the stance that you should be looking through your Borg logs for warnings.
witten added the
question / support
label 2023-11-11 17:58:03 +00:00
Author
Contributor

I think the behavior of borg is completely fine, it detects some problems with the underlying filesystem, writes a warning message and decides that it is a good idea to inform the user by returning an exit code greater than 0. This is perfect for automated backups because this is usually what you are looking for to detect issue one needs to take care of.
On the other hand, borgmatic also seems to see the warning message, but decides not to return an exit code. And indeed, this would force the user to permanently parse log files to find possible problems.
I don't think this is an ideal scenario, especially because in my case borg did not backup anything for many weeks before I realized the problem with the NFS mount. This really imposes a risk of losing data, if nothing gets written into the backups.

Is there a reason why borgmatic doesn't want to inform the user about warnings, the same way that borg does?
On the other hand modifying borgmatic now to return non-zero exit codes on warnings would probably break a lot of installations of borgmatic.

One solution could be to introduce something like on_warning-hooks, and also allow monitoring hooks like ntfy to send out warnings messages (by adding another handler).
Whatever the solution is, those kind of warnings should IMHO not be hidden in logfiles.

I think the behavior of borg is completely fine, it detects some problems with the underlying filesystem, writes a warning message and decides that it is a good idea to inform the user by returning an exit code greater than 0. This is perfect for automated backups because this is usually what you are looking for to detect issue one needs to take care of. On the other hand, borgmatic also seems to see the warning message, but decides not to return an exit code. And indeed, this would force the user to permanently parse log files to find possible problems. I don't think this is an ideal scenario, especially because in my case borg did not backup anything for many weeks before I realized the problem with the NFS mount. This really imposes a risk of losing data, if nothing gets written into the backups. Is there a reason why borgmatic doesn't want to inform the user about warnings, the same way that borg does? On the other hand modifying borgmatic now to return non-zero exit codes on warnings would probably break a lot of installations of borgmatic. One solution could be to introduce something like `on_warning`-hooks, and also allow monitoring hooks like ntfy to send out warnings messages (by adding another handler). Whatever the solution is, those kind of warnings should IMHO not be hidden in logfiles.
Owner

Is there a reason why borgmatic doesn't want to inform the user about warnings, the same way that borg does?

borgmatic has historically indicated warnings to the user via logging warnings rather than altering borgmatic's exit code. Part of the rationale is that borgmatic is often run via cron (or similar job runners), and with cron an exit code of 1 is used to indicate an error that needs attention rather than a warning. This is also made more complicated by the fact that Borg issues warnings for a large number of conditions, including "minor" things like source files changing while they're being read.

So that's the background. I will say that borgmatic can and does make breaking changes, for instance along with bigger version number bumps, so it's not out of the question to change its behavior with regards to exit codes. But I think the challenge here, as suggested above, is that not all Borg "warnings" are created equal. Sometimes the user cares about them such as in your case, and other times the user really doesn't want to be bothered. So unconditionally turning all Borg warnings into a borgmatic exit code 1 might be pretty annoying for many users.

I think the ideal would be for borgmatic users to be able to pick and choose which warnings they care about, but Borg doesn't currently make that information available programmatically AFAIK.

Your on_warning / monitoring hook idea is interesting, although I'm not sure that entirely solves it either, because it would presumably suffer from the same "I only care about certain warnings" problem.

So that brings me back to the before_backup approach. Rather than relying on Borg to tell you that your backup is useless via a "warning," maybe you could write a before_backup script that ensures the source filesystem is actually mounted...

Something like this:

before_backup:
    - findmnt /Linkstation/Praxis > /dev/null

... which you could even combine with borgmatic retry logic so that it won't give up right away. For instance:

retries: 5
retry_wait: 10

Anyway, let me know your thoughts.

> Is there a reason why borgmatic doesn't want to inform the user about warnings, the same way that borg does? borgmatic has historically indicated warnings to the user via logging warnings rather than altering borgmatic's exit code. Part of the rationale is that borgmatic is often run via cron (or similar job runners), and with cron an exit code of 1 is used to indicate an error that needs attention rather than a warning. This is also made more complicated by the fact that Borg issues warnings for a large number of conditions, including "minor" things like source files changing while they're being read. So that's the background. I will say that borgmatic can and does make breaking changes, for instance along with bigger version number bumps, so it's not out of the question to change its behavior with regards to exit codes. But I think the challenge here, as suggested above, is that not all Borg "warnings" are created equal. Sometimes the user cares about them such as in your case, and other times the user really doesn't want to be bothered. So unconditionally turning all Borg warnings into a borgmatic exit code 1 might be pretty annoying for many users. I think the ideal would be for borgmatic users to be able to pick and choose which warnings they care about, but Borg doesn't currently make that information available programmatically AFAIK. Your `on_warning` / monitoring hook idea is interesting, although I'm not sure that entirely solves it either, because it would presumably suffer from the same "I only care about certain warnings" problem. So that brings me back to the `before_backup` approach. Rather than relying on Borg to tell you that your backup is useless via a "warning," maybe you could write a `before_backup` script that ensures the source filesystem is actually mounted... Something like this: ``` before_backup: - findmnt /Linkstation/Praxis > /dev/null ``` ... which you could even combine with borgmatic retry logic so that it won't give up right away. For instance: ``` retries: 5 retry_wait: 10 ``` Anyway, let me know your thoughts.
Owner

Related ticket: #798.

Related ticket: #798.
witten added the
waiting for response
label 2023-12-23 02:04:07 +00:00
witten removed the
waiting for response
label 2024-01-09 21:51:38 +00:00
Owner

I'm closing this for now, but I'd be happy to continue the discussion and/or reopen if necessary. Thanks!

I'm closing this for now, but I'd be happy to continue the discussion and/or reopen if necessary. Thanks!
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#787
No description provided.