Backup archives are created despite failing database dumps #758
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#758
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I am monitoring Borgmatic archives via Icinga https://github.com/chris2k20/check_borgmatic and always assumed that if Borgmatic fails to backup my databases, no archive would be created.
However, archives are created no matter if database dumps succeed or fail.
Steps to reproduce
Configure Borgmatic to backup up Mysql databases with credentials for an unreachable database server, triggered via systemd service.
Actual behavior
Borgmatic log shows creation of archive and critical errors for database dumps:
Archive and database dumps with a size of 0 bytes:
Expected behavior
If database backups fail, Borgmatic should not create an archive with empty (0-byte) mysql dumps - but not create any archive at all. This way problems can be noticed without needing to additionally monitor the systemd service for errors.
Other notes / implementation ideas
No response
borgmatic version
1.7.4
borgmatic installation method
Debian package (bullseye-backports)
Borg version
borg 1.2.3
Python version
Python 3.9.2
Database version (if applicable)
10.5
Operating system and version
Debian 11
Thanks for filing this! I've managed to reproduce the issue locally (with both 1.7.4 and main), but I'm not yet sure what if anything can be done about it given how database dumps are streamed to Borg on demand. I'll look into this though and get back to you.
I will say that regardless of the archive with zero-byte dumps issue, I do recommend monitoring the status of your job runner (systemd service) as an additional check on whether backups succeeded.
An update: I've played with this a bit, but thus far I haven't found a reliable way to cause Borg to fail if a database dump fails. That's because as soon as the dump process fails, it closes the named piped that's sending dump data to Borg, and then Borg immediately finishes archiving the zero-byte file and exits as "success." But I'll leave this open in case I can come up with an approach that actually works.
Thanks for checking. I also thought about using hooks to run a mysql command using the provided variables in the yaml to check if the database connection succeeds before the backup is started, but those variables (e.g. "{mysql_databases}") seem not to be available in before_actions and before_backup.
Yeah, the set of variables available in hooks is pretty limited right now: https://torsion.org/borgmatic/docs/how-to/add-preparation-and-cleanup-steps-to-backups/#variable-interpolation
So it looks like this issue was inadvertently "fixed" in borgmatic 1.7.10 (likely as part of #396). With 1.7.10+, once
mysqldump
errors, borgmatic appears to kill Borg before it can finish making the archive with a zero-byte dump. I use the scare quotes around "fixed" because I'm not 100% sure the issue can't occur in newer versions of borgmatic, as it comes down to a timing issue (Borg vs.mysqldump
), and it's just that the timing appears to be more favorable in 1.7.10+.So if you could, it would be great if you'd upgrade borgmatic and try reproing your issue again. If you can still make it happen, I'd be real interested in details about how you did it. And if it no longer occurs after upgrading, I'd be interested in that too. Thank you!
Thanks for letting me know, meanwhile I have started using Borgmatic 1.8.x and am unable to reproduce the issue, when the DB server is unavailable an error occurs and no backup is created.
I am currently unable to verify if the issue still occurs with 1.7.x
Okay, then I'll call this done for now. But please feel free to reopen or file a new ticket if you encounter this issue again.