Backup archives are created despite failing database dumps #758

Closed
opened 2023-09-18 15:54:03 +00:00 by martipoe · 7 comments

What I'm trying to do and why

I am monitoring Borgmatic archives via Icinga https://github.com/chris2k20/check_borgmatic and always assumed that if Borgmatic fails to backup my databases, no archive would be created.

However, archives are created no matter if database dumps succeed or fail.

Steps to reproduce

Configure Borgmatic to backup up Mysql databases with credentials for an unreachable database server, triggered via systemd service.

Actual behavior

Borgmatic log shows creation of archive and critical errors for database dumps:

ep 18 12:05:14 host.example.org systemd[1]: Starting borgmatic backup...
Sep 18 12:06:15 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Pruning archives
Sep 18 12:06:15 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Compacting segments
Sep 18 12:06:16 host.example.org borgmatic[2410452]: INFO compaction freed about 4.25 kB repository space.
Sep 18 12:06:16 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Creating archive
Sep 18 12:06:16 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Dumping MySQL databases
Sep 18 12:07:16 host.example.org borgmatic[2410452]: WARNING /srv/backup/example.borg: Excluding special files to prevent Borg from hanging:
Sep 18 12:07:16 host.example.org borgmatic[2410452]: INFO Creating archive at "/srv/backup/example.borg::example-2023-09-18T12:07:16.538967"
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /srv/backup/example.borg: Error running actions for repository
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2.
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /etc/borgmatic/config.yaml: Error running configuration file
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /srv/backup/example.borg: Error running actions for repository
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: summary:
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2.
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /etc/borgmatic/config.yaml: Error running configuration file
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /srv/backup/example.borg: Error running actions for repository
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /etc/borgmatic/config.yaml: Error running configuration file
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2.
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL summary:
Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: Need some help? https://torsion.org/borgmatic/#issues
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /etc/borgmatic/config.yaml: Error running configuration file
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /srv/backup/example.borg: Error running actions for repository
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2.
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL
Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Need some help? https://torsion.org/borgmatic/#issues
Sep 18 12:12:19 host.example.org systemd-inhibit[2410451]: /usr/bin/borgmatic failed with exit status 1.
Sep 18 12:12:19 host.example.org systemd[1]: borgmatic.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 12:12:19 host.example.org systemd[1]: borgmatic.service: Failed with result 'exit-code'.
Sep 18 12:12:19 host.example.org systemd[1]: Failed to start borgmatic backup.
Sep 18 12:12:19 host.example.org systemd[1]: borgmatic.service: Consumed 50.555s CPU time.

Archive and database dumps with a size of 0 bytes:

root@:~# borgmatic list | grep 2023-09-18T12:07:16.538967
example-18T12:07:16.538967 Mon, 2023-09-18 12:07:16 [1716c600a19f4a497f822a95e2de26bac2e7b91301a3f2beb6dc194528015dc6]

root@:~# borgmatic list --archive example-2023-09-18T12:07:16.538967 | grep root/.borgmatic/
drwxr-xr-x root   root          0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases
drwx------ root   root          0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases/10.0.0.20
-rw------- root   root          0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases/10.0.0.20/example_db
-rw------- root   root          0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases/10.0.0.20/matomo_db
drwxr-x--- root   root          0 Fri, 2023-06-30 14:57:56 root/.borgmatic/checks

Expected behavior

If database backups fail, Borgmatic should not create an archive with empty (0-byte) mysql dumps - but not create any archive at all. This way problems can be noticed without needing to additionally monitor the systemd service for errors.

Other notes / implementation ideas

No response

borgmatic version

1.7.4

borgmatic installation method

Debian package (bullseye-backports)

Borg version

borg 1.2.3

Python version

Python 3.9.2

Database version (if applicable)

10.5

Operating system and version

Debian 11

### What I'm trying to do and why I am monitoring Borgmatic archives via Icinga https://github.com/chris2k20/check_borgmatic and always assumed that if Borgmatic fails to backup my databases, no archive would be created. However, archives are created no matter if database dumps succeed or fail. ### Steps to reproduce Configure Borgmatic to backup up Mysql databases with credentials for an unreachable database server, triggered via systemd service. ### Actual behavior Borgmatic log shows creation of archive and critical errors for database dumps: ``` ep 18 12:05:14 host.example.org systemd[1]: Starting borgmatic backup... Sep 18 12:06:15 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Pruning archives Sep 18 12:06:15 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Compacting segments Sep 18 12:06:16 host.example.org borgmatic[2410452]: INFO compaction freed about 4.25 kB repository space. Sep 18 12:06:16 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Creating archive Sep 18 12:06:16 host.example.org borgmatic[2410452]: INFO /srv/backup/example.borg: Dumping MySQL databases Sep 18 12:07:16 host.example.org borgmatic[2410452]: WARNING /srv/backup/example.borg: Excluding special files to prevent Borg from hanging: Sep 18 12:07:16 host.example.org borgmatic[2410452]: INFO Creating archive at "/srv/backup/example.borg::example-2023-09-18T12:07:16.538967" Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /srv/backup/example.borg: Error running actions for repository Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2. Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /etc/borgmatic/config.yaml: Error running configuration file Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /srv/backup/example.borg: Error running actions for repository Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: summary: Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2. Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /etc/borgmatic/config.yaml: Error running configuration file Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: /srv/backup/example.borg: Error running actions for repository Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /etc/borgmatic/config.yaml: Error running configuration file Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2. Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL summary: Sep 18 12:12:19 host.example.org systemd-inhibit[2410452]: Need some help? https://torsion.org/borgmatic/#issues Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /etc/borgmatic/config.yaml: Error running configuration file Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL /srv/backup/example.borg: Error running actions for repository Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Command 'mysqldump --single-transaction --skip-lock-tables --skip-extended-insert --add-drop-database --order-by-primary --add-drop-database --host 10.0.0.20 --port 3306 --protocol tcp --user borgmatic --databases example_db > /root/.borgmatic/mysql_databases/10.0.0.20/example_db' returned non-zero exit status 2. Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Sep 18 12:12:19 host.example.org borgmatic[2410452]: CRITICAL Need some help? https://torsion.org/borgmatic/#issues Sep 18 12:12:19 host.example.org systemd-inhibit[2410451]: /usr/bin/borgmatic failed with exit status 1. Sep 18 12:12:19 host.example.org systemd[1]: borgmatic.service: Main process exited, code=exited, status=1/FAILURE Sep 18 12:12:19 host.example.org systemd[1]: borgmatic.service: Failed with result 'exit-code'. Sep 18 12:12:19 host.example.org systemd[1]: Failed to start borgmatic backup. Sep 18 12:12:19 host.example.org systemd[1]: borgmatic.service: Consumed 50.555s CPU time. ``` Archive and database dumps with a size of 0 bytes: ``` root@:~# borgmatic list | grep 2023-09-18T12:07:16.538967 example-18T12:07:16.538967 Mon, 2023-09-18 12:07:16 [1716c600a19f4a497f822a95e2de26bac2e7b91301a3f2beb6dc194528015dc6] root@:~# borgmatic list --archive example-2023-09-18T12:07:16.538967 | grep root/.borgmatic/ drwxr-xr-x root root 0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases drwx------ root root 0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases/10.0.0.20 -rw------- root root 0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases/10.0.0.20/example_db -rw------- root root 0 Mon, 2023-09-18 12:06:16 root/.borgmatic/mysql_databases/10.0.0.20/matomo_db drwxr-x--- root root 0 Fri, 2023-06-30 14:57:56 root/.borgmatic/checks ``` ### Expected behavior If database backups fail, Borgmatic should not create an archive with empty (0-byte) mysql dumps - but not create any archive at all. This way problems can be noticed without needing to additionally monitor the systemd service for errors. ### Other notes / implementation ideas _No response_ ### borgmatic version 1.7.4 ### borgmatic installation method Debian package (bullseye-backports) ### Borg version borg 1.2.3 ### Python version Python 3.9.2 ### Database version (if applicable) 10.5 ### Operating system and version Debian 11
Owner

Thanks for filing this! I've managed to reproduce the issue locally (with both 1.7.4 and main), but I'm not yet sure what if anything can be done about it given how database dumps are streamed to Borg on demand. I'll look into this though and get back to you.

I will say that regardless of the archive with zero-byte dumps issue, I do recommend monitoring the status of your job runner (systemd service) as an additional check on whether backups succeeded.

Thanks for filing this! I've managed to reproduce the issue locally (with both 1.7.4 and main), but I'm not yet sure what if anything can be done about it given how database dumps are streamed to Borg on demand. I'll look into this though and get back to you. I will say that regardless of the archive with zero-byte dumps issue, I do recommend monitoring the status of your job runner (systemd service) as an additional check on whether backups succeeded.
Owner

An update: I've played with this a bit, but thus far I haven't found a reliable way to cause Borg to fail if a database dump fails. That's because as soon as the dump process fails, it closes the named piped that's sending dump data to Borg, and then Borg immediately finishes archiving the zero-byte file and exits as "success." But I'll leave this open in case I can come up with an approach that actually works.

An update: I've played with this a bit, but thus far I haven't found a reliable way to cause Borg to fail if a database dump fails. That's because as soon as the dump process fails, it closes the named piped that's sending dump data to Borg, and then Borg immediately finishes archiving the zero-byte file and exits as "success." But I'll leave this open in case I can come up with an approach that actually works.
Author

Thanks for checking. I also thought about using hooks to run a mysql command using the provided variables in the yaml to check if the database connection succeeds before the backup is started, but those variables (e.g. "{mysql_databases}") seem not to be available in before_actions and before_backup.

Thanks for checking. I also thought about using hooks to run a mysql command using the provided variables in the yaml to check if the database connection succeeds before the backup is started, but those variables (e.g. "{mysql_databases}") seem not to be available in before_actions and before_backup.
Owner

Yeah, the set of variables available in hooks is pretty limited right now: https://torsion.org/borgmatic/docs/how-to/add-preparation-and-cleanup-steps-to-backups/#variable-interpolation

Yeah, the set of variables available in hooks is pretty limited right now: https://torsion.org/borgmatic/docs/how-to/add-preparation-and-cleanup-steps-to-backups/#variable-interpolation
Owner

So it looks like this issue was inadvertently "fixed" in borgmatic 1.7.10 (likely as part of #396). With 1.7.10+, once mysqldump errors, borgmatic appears to kill Borg before it can finish making the archive with a zero-byte dump. I use the scare quotes around "fixed" because I'm not 100% sure the issue can't occur in newer versions of borgmatic, as it comes down to a timing issue (Borg vs. mysqldump), and it's just that the timing appears to be more favorable in 1.7.10+.

So if you could, it would be great if you'd upgrade borgmatic and try reproing your issue again. If you can still make it happen, I'd be real interested in details about how you did it. And if it no longer occurs after upgrading, I'd be interested in that too. Thank you!

So it looks like this issue was inadvertently "fixed" in borgmatic 1.7.10 (likely as part of #396). With 1.7.10+, once `mysqldump` errors, borgmatic appears to kill Borg before it can finish making the archive with a zero-byte dump. I use the scare quotes around "fixed" because I'm not 100% sure the issue can't occur in newer versions of borgmatic, as it comes down to a timing issue (Borg vs. `mysqldump`), and it's just that the timing appears to be more favorable in 1.7.10+. So if you could, it would be great if you'd upgrade borgmatic and try reproing your issue again. If you can still make it happen, I'd be real interested in details about how you did it. And if it no longer occurs after upgrading, I'd be interested in that too. Thank you!
Author

Thanks for letting me know, meanwhile I have started using Borgmatic 1.8.x and am unable to reproduce the issue, when the DB server is unavailable an error occurs and no backup is created.

I am currently unable to verify if the issue still occurs with 1.7.x

Thanks for letting me know, meanwhile I have started using Borgmatic 1.8.x and am unable to reproduce the issue, when the DB server is unavailable an error occurs and no backup is created. I am currently unable to verify if the issue still occurs with 1.7.x
Owner

Okay, then I'll call this done for now. But please feel free to reopen or file a new ticket if you encounter this issue again.

Okay, then I'll call this done for now. But please feel free to reopen or file a new ticket if you encounter this issue again.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#758
No description provided.