MySQL Hooks Causing Infinite Hang #710

Closed
opened 2023-06-11 16:47:09 +00:00 by HippieBeak · 9 comments

What I'm trying to do and why

I recently bought a new server and migrated by setup from virtual machines to docker. The server is mostly a bunch of Minecraft servers and a MariaDB database. I setup a cron job to backup everything nightly, specifying the directories for the Minecraft servers, and using the builtin MySQL hook for the database. The Minecraft servers back up perfectly, without issues. However, the database backup seems to hang forever (though not always). I am trying to figure out what the cause is, and how to fix it.

Steps to reproduce (if a bug)

Originally, I tried just having one configuration file, but that was preventing my Minecraft servers from being backed up whenever the database backup started hanging, so I split them into two configuration files.

Minecraft server configuration: https://pastebin.com/VdQ1LTgV
Database configuration: https://pastebin.com/xeez2EVd

To run both of these, I have a cronjob set up to run once per day at midnight, using the following commands:

borgmatic --config /etc/borgmatic.d/config.yaml --verbosity 2 > /home/log.txt 2>&1
borgmatic --config /etc/borgmatic.d/db_config.yaml --files --verbosity 2 > /home/db_log.txt 2>&1

I've never had it work using cron (maybe something to do with them both running at the same time...?) but have periodically got it to backup by running the above command manually.

I am using BorgBase for my repo.

Actual behavior (if a bug)

The backup hangs forever. Below, I have an example of a log where the job succeeded, and one where it failed.

Success: https://pastebin.com/Gv2Yden0
Failure (hanging indefinitely): https://pastebin.com/VMWWuYCd

I also don't see the pipe at /root/.borgmatic/mysql_databases/localhost/ right now while it is currently hanging. I believe it did exist prior to the hanging, earlier in the backup when it was dumping all the tables.

Expected behavior (if a bug)

For the database to backup successfully.

Other notes / implementation ideas

N/A

Environment

borgmatic version: 1.7.13

borgmatic installation method: PIP3 (python)

Borg version: borg 1.2.0

Python version: Python 3.10.6

Database version (if applicable):
Client: mysql Ver 8.0.33-0ubuntu0.22.04.2 for Linux on x86_64 ((Ubuntu))
Server: mysql Ver 15.1 Distrib 10.11.3-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper

operating system and version: Ubuntu 22.04.2 LTS

#### What I'm trying to do and why I recently bought a new server and migrated by setup from virtual machines to docker. The server is mostly a bunch of Minecraft servers and a MariaDB database. I setup a cron job to backup everything nightly, specifying the directories for the Minecraft servers, and using the builtin MySQL hook for the database. The Minecraft servers back up perfectly, without issues. However, the database backup seems to hang forever (though not always). I am trying to figure out what the cause is, and how to fix it. #### Steps to reproduce (if a bug) Originally, I tried just having one configuration file, but that was preventing my Minecraft servers from being backed up whenever the database backup started hanging, so I split them into two configuration files. Minecraft server configuration: https://pastebin.com/VdQ1LTgV Database configuration: https://pastebin.com/xeez2EVd To run both of these, I have a cronjob set up to run once per day at midnight, using the following commands: `borgmatic --config /etc/borgmatic.d/config.yaml --verbosity 2 > /home/log.txt 2>&1` `borgmatic --config /etc/borgmatic.d/db_config.yaml --files --verbosity 2 > /home/db_log.txt 2>&1` I've never had it work using cron (maybe something to do with them both running at the same time...?) but have periodically got it to backup by running the above command manually. I am using BorgBase for my repo. #### Actual behavior (if a bug) The backup hangs forever. Below, I have an example of a log where the job succeeded, and one where it failed. Success: https://pastebin.com/Gv2Yden0 Failure (hanging indefinitely): https://pastebin.com/VMWWuYCd I also don't see the pipe at /root/.borgmatic/mysql_databases/localhost/ right now while it is currently hanging. I believe it did exist prior to the hanging, earlier in the backup when it was dumping all the tables. #### Expected behavior (if a bug) For the database to backup successfully. #### Other notes / implementation ideas N/A #### Environment **borgmatic version:** 1.7.13 **borgmatic installation method:** PIP3 (python) **Borg version:** borg 1.2.0 **Python version:** Python 3.10.6 **Database version (if applicable):** Client: mysql Ver 8.0.33-0ubuntu0.22.04.2 for Linux on x86_64 ((Ubuntu)) Server: mysql Ver 15.1 Distrib 10.11.3-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper **operating system and version:** Ubuntu 22.04.2 LTS
Owner

Thanks for taking the time to file this. Historically, most of the hangs experienced in borgmatic when dumping databases have been when Borg goes to read a named pipe and the MySQL dump isn't actually putting any database dump data into it. But as far as I know, the edge cases where that can occur have been fixed in borgmatic—although it's entirely possible you're hitting another edge case that isn't yet fixed. So here are some debugging steps I'd recommend:

  • Right before running your backup, run this as the same user to see if there are any stale named pipes leftover from a previous run: ls -R ~/.borgmatic/*databases Even if there are, borgmatic should delete them when it starts up. But this would be a good data point to know.
  • When borgmatic hangs, what's the output of ps xua | grep mysqldump? In other words, are all the dumps still running, or have some/all of them exited? (There should be one per database.)
  • Any MySQL logs? In particular I'm thinking about errors that might show up there.

I've never had it work using cron (maybe something to do with them both running at the same time...?)

FYI there will be errors if you try to use the same repository from two different Borg clients at the same time. Maybe that's what you were seeing.

Thanks for taking the time to file this. Historically, most of the hangs experienced in borgmatic when dumping databases have been when Borg goes to read a named pipe and the MySQL dump isn't actually putting any database dump data into it. But as far as I know, the edge cases where that can occur have been fixed in borgmatic—although it's entirely possible you're hitting another edge case that isn't yet fixed. So here are some debugging steps I'd recommend: * Right before running your backup, run this as the same user to see if there are any stale named pipes leftover from a previous run: `ls -R ~/.borgmatic/*databases` Even if there are, borgmatic *should* delete them when it starts up. But this would be a good data point to know. * When borgmatic hangs, what's the output of `ps xua | grep mysqldump`? In other words, are all the dumps still running, or have some/all of them exited? (There should be one per database.) * Any [MySQL logs](https://stackoverflow.com/questions/5441972/how-to-see-log-files-in-mysql)? In particular I'm thinking about errors that might show up there. > I've never had it work using cron (maybe something to do with them both running at the same time...?) FYI there will be errors if you try to use the same repository from two different Borg clients at the same time. Maybe that's what you were seeing.
Author

Thank you for the quick response!

#1: I will try that later tonight when I am home at my computer.
#2: I don't see it running, all I see is:

root 1077929 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077930 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077931 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077932 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077933 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077934 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077935 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077936 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077937 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077938 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077939 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077940 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077941 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077942 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077943 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077944 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077945 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077946 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]
root 1077947 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump]

Though I do see some borg commands:
root 1077901 0.0 0.0 2888 948 ? Ss 00:00 0:00 /bin/sh -c borgmatic --config /etc/borgmatic.d/db_config.yaml --files --verbosity 2 > /home/db_log.txt 2>&1
root 1077902 0.0 0.0 47180 33864 ? S 00:00 0:00 /usr/bin/python3 /usr/local/bin/borgmatic --config /etc/borgmatic.d/db_config.yaml --files --verbosity 2
root 1077952 0.0 0.0 107440 74928 ? S 00:00 0:00 /usr/bin/python3 /usr/bin/borg create --exclude-caches --compression auto,zstd --one-file-system --read-special --list --filter AMEx- ${TEST_REPO}::{hostname}-{now} /root/.borgmatic --debug --show-rc
root 1077953 0.0 0.0 15124 10212 ? S 00:00 0:00 ssh ${TEST_REPO} borg serve --debug

#3: I will need to dig deeper when I'm home and at my computer, but I do not see any MySQL errors at a quick glance.
#4: They are two separate repositories, so I think it should be fine.

Thank you for the quick response! #1: I will try that later tonight when I am home at my computer. #2: I don't see it running, all I see is: root 1077929 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077930 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077931 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077932 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077933 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077934 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077935 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077936 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077937 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077938 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077939 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077940 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077941 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077942 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077943 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077944 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077945 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077946 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> root 1077947 0.0 0.0 0 0 ? Z 00:00 0:00 [mysqldump] <defunct> Though I do see some borg commands: root 1077901 0.0 0.0 2888 948 ? Ss 00:00 0:00 /bin/sh -c borgmatic --config /etc/borgmatic.d/db_config.yaml --files --verbosity 2 > /home/db_log.txt 2>&1 root 1077902 0.0 0.0 47180 33864 ? S 00:00 0:00 /usr/bin/python3 /usr/local/bin/borgmatic --config /etc/borgmatic.d/db_config.yaml --files --verbosity 2 root 1077952 0.0 0.0 107440 74928 ? S 00:00 0:00 /usr/bin/python3 /usr/bin/borg create --exclude-caches --compression auto,zstd --one-file-system --read-special --list --filter AMEx- ${TEST_REPO}::{hostname}-{now} /root/.borgmatic --debug --show-rc root 1077953 0.0 0.0 15124 10212 ? S 00:00 0:00 ssh ${TEST_REPO} borg serve --debug #3: I will need to dig deeper when I'm home and at my computer, but I do not see any MySQL errors at a quick glance. #4: They are two separate repositories, so I think it should be fine.
Owner

Based on that output, the Z in the process list suggests that the mysqldump processes have all exited (but have not yet been reaped by their parent process). That could explain the hang: If Borg is trying to read database dump data and mysqldump isn't producing any, that would cause Borg to wait forever for it. So the real question is: Why is mysqldump exiting? To answer that, I would recommend looking both at borgmatic logs (e.g. /home/db_log.txt) and any MySQL logs you can find.

Based on that output, the `Z` in the process list suggests that the `mysqldump` processes have all exited (but have not yet been reaped by their parent process). That could explain the hang: If Borg is trying to read database dump data and `mysqldump` isn't producing any, that would cause Borg to wait forever for it. So the real question is: Why is `mysqldump` exiting? To answer that, I would recommend looking both at borgmatic logs (e.g. `/home/db_log.txt`) and any MySQL logs you can find.
Author

Now that I'm back on my computer, I have some additional information.

Re 1: There were no stale pipes left over when I re-ran the configuration and I could still reproduce the hanging behavior.

Re 3: I was not able to turn on error logging for MySQL, as it requires a restart, and it is currently peak time for my user base. I will attempt to do this tomorrow if it is absolutely necessary (which I think it probably is). I turned on general logging and did not see any interruptions.

Here's an example of /home/db_log.txt hanging: https://pastebin.com/VMWWuYCd
Versus a success: https://pastebin.com/Gv2Yden0

Also, a weird correlation I'm noticing is that when I run both configurations at the same time (that is, https://pastebin.com/VdQ1LTgV and https://pastebin.com/xeez2EVd) or run them as one big configuration file), I experience the hanging. However, when I stagger the two configurations (say, run them 6 hours apart), it seems to consistently succeed. This could all be a coincidence, though.

Now that I'm back on my computer, I have some additional information. Re 1: There were no stale pipes left over when I re-ran the configuration and I could still reproduce the hanging behavior. Re 3: I was not able to turn on error logging for MySQL, as it requires a restart, and it is currently peak time for my user base. I will attempt to do this tomorrow if it is absolutely necessary (which I think it probably is). I turned on general logging and did not see any interruptions. Here's an example of `/home/db_log.txt` hanging: https://pastebin.com/VMWWuYCd Versus a success: https://pastebin.com/Gv2Yden0 Also, a weird correlation I'm noticing is that when I run both configurations at the same time (that is, https://pastebin.com/VdQ1LTgV and https://pastebin.com/xeez2EVd) or run them as one big configuration file), I experience the hanging. However, when I stagger the two configurations (say, run them 6 hours apart), it seems to consistently succeed. This could all be a coincidence, though.
Owner

Interesting. The fact that the hang occurs when you run the two configurations together (either simultaneously or in the same configuration file) seems like a compelling clue. So here's another idea. Do you maybe have some special files in the Minecraft server configuration that could be causing Borg to hang (when read_special gets switched on implicitly due to the database hook)? The way to find that out would be something like:

find /var/lib/pterodactyl /var/lib/docker /etc/pterodactyl ~/.borgmatic -type b -or -type c -or \
    -type p -or -type s

Then, if that does show any special files, can you move them temporarily aside to try one of the borgmatic hang scenarios? Does that then prevent the hang?

My theory is that something's going wrong with the borgmatic code that is supposed to auto-exclude these files when the database hook is enabled. Alternatively, maybe that code works fine, but running the two configurations simultaneously introduces a special file from one configuration that the other configuration isn't expecting, resulting in a hang. (As you can see, special files really are the enemy here.)

Turning on MySQL error logging might not be necessarily. The MySQL problem (if it occurs) is probably on the client anyway—the failing mysqldump commands. Any they might not even be failing; they could be timing out as a symptom of the actual underlying problem that's causing the hang...

Thanks for your patience here.

Interesting. The fact that the hang occurs when you run the two configurations together (either simultaneously or in the same configuration file) seems like a compelling clue. So here's another idea. Do you maybe have some special files in the Minecraft server configuration that could be causing Borg to hang (when `read_special` gets switched on implicitly due to the database hook)? The way to find that out would be something like: ```bash find /var/lib/pterodactyl /var/lib/docker /etc/pterodactyl ~/.borgmatic -type b -or -type c -or \ -type p -or -type s ``` Then, if that does show any special files, can you move them temporarily aside to try one of the borgmatic hang scenarios? Does that then prevent the hang? My theory is that something's going wrong with the borgmatic code that is supposed to auto-exclude these files when the database hook is enabled. Alternatively, maybe that code works fine, but running the two configurations simultaneously introduces a special file from one configuration that the other configuration isn't expecting, resulting in a hang. (As you can see, special files really are the enemy here.) Turning on MySQL error logging might not be necessarily. The MySQL problem (if it occurs) is probably on the client anyway—the failing `mysqldump` commands. Any they might not even be failing; they could be timing out as a symptom of the actual underlying problem that's causing the hang... Thanks for your patience here.
Author

Looks like there are ~50 results. Most are in /var/lib/docker/overlay2 but I also found one called /var/lib/docker/volumes/backingFsBlockDev, which implements disk quotas. I don't know that I can safely move these files. Is there a way to explicitly ignore certain directories/files in location.source_directories instead?

By the way, I set up my cron job to run the two Borg jobs a few hours apart, and there is no longer any hanging (though, this is not ideal because I'd like for my Minecraft server files to be in sync with my database).

Looks like there are ~50 results. Most are in /var/lib/docker/overlay2 but I also found one called /var/lib/docker/volumes/backingFsBlockDev, which implements disk quotas. I don't know that I can safely move these files. Is there a way to explicitly ignore certain directories/files in location.source_directories instead? By the way, I set up my cron job to run the two Borg jobs a few hours apart, and there is no longer any hanging (though, this is not ideal because I'd like for my Minecraft server files to be in sync with my database).
Owner

You can explicitly exclude certain directories/files via the excludes_patterns option. There are also several other ways to do excludes. See the configuration reference for details.

So I recommend you either:

  1. Omit /var/lib/docker from your Minecraft server source directories. If there are particular Docker volumes you're interested in backing up, just include those.
  2. Or: Keep /var/lib/docker but then populate exclude_patterns with paths to special files or directories that may be contributing to the hang you're experiencing. This may be tricky, because new special files may be showing up dynamically as Docker runs.

For context, borgmatic does have code to hunt for special files within the specified source_directories and auto-exclude them. However, it's not bulletproof in that if new special files show up after this auto-excluding happens, there's nothing borgmatic can do about that. And given the volatile nature of /var/lib/docker, that's exactly what I suspect is happening here—a new special file shows up, too late for borgmatic to exclude it, and Borg tries to read it as part of the backup and hangs.

One of the options above should hopefully allow you to run your two Borg jobs together in a single configuration file.

You can explicitly exclude certain directories/files via the `excludes_patterns` option. There are also several other ways to do excludes. See the [configuration reference](https://torsion.org/borgmatic/docs/reference/configuration/) for details. So I recommend you either: 1. Omit `/var/lib/docker` from your Minecraft server source directories. If there are particular Docker volumes you're interested in backing up, just include those. 2. Or: Keep `/var/lib/docker` but then populate `exclude_patterns` with paths to special files or directories that may be contributing to the hang you're experiencing. This may be tricky, because new special files may be showing up dynamically as Docker runs. For context, borgmatic does have code to hunt for special files within the specified `source_directories` and auto-exclude them. However, it's not bulletproof in that if *new* special files show up after this auto-excluding happens, there's nothing borgmatic can do about that. And given the volatile nature of `/var/lib/docker`, that's exactly what I suspect is happening here—a new special file shows up, too late for borgmatic to exclude it, and Borg tries to read it as part of the backup and hangs. One of the options above should hopefully allow you to run your two Borg jobs together in a single configuration file.
Author

It looks like excluding those paths allowed the backup to succeed! Thank you so much for your help debugging this.

It looks like excluding those paths allowed the backup to succeed! Thank you so much for your help debugging this.
Owner

Glad to hear that did it! I'll add a note to the documentation about this presumed edge case, as I'm not sure there's much borgmatic can do automatically given its current approach to database dumps.

Glad to hear that did it! I'll add a note to the documentation about this presumed edge case, as I'm not sure there's much borgmatic can do automatically given its current approach to database dumps.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#710
No description provided.