Borgmatic hangs while using mysqldump #755
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#755
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I try to use borgmatic with mysql databases.
If I enable mysql hook, the task hang indefinitly and need to kill the process to finish.
Sometime, it work.
Steps to reproduce
Start script using Rundeck with :
sudo borgmatic --config /$conf_folder/confs/borgmatic/${what_to_backup}_${db_name}.yaml --verbosity 2;
Actual behavior
End with terminating with success status, rc 0 before borgmatic execute mysqldump.
Expected behavior
No response
Other notes / implementation ideas
No response
borgmatic version
1.7.7
borgmatic installation method
deb package
Borg version
1.1.16
Python version
Python 3.9.2
Database version (if applicable)
mariadb:10.11.4 & mariadb:11.1
Operating system and version
Debian GNU/Linux 11 (bullseye)
Thanks for taking the time to file this! So to confirm, borgmatic itself hangs indefinitely when configured with the MySQL hook on your system?
When borgmatic hangs on a database dump, it's usually due to Borg getting stuck on borgmatic's named pipe or another special file you have on your filesystem.. borgmatic does have measure in place to try to protect against this, but there are limitations.
One thing you can do is try a command like the following to find these special files that may be causing problems:
(Substituting your actual source path for
/your/source/path
.)Also, getting a look at your configuration and your full borgmatic output/logs would help diagnose this issue. Thanks!
Hi @witten
Thank's for your anwser.
I've already read the limitation and do a separate file for backup files and backup databases and the probleam appear again.
The find return nothing so there are no specials file If I understand correctly.
And yes, borgmatic hangs itself indefinitely when mysql hook is set on the config file.
My config file :
Your configuration file looks fine to me. Can I get a look at your borgmatic log output when run with
--verbosity 2
? Thanks!Hi @witten
This is the output, it stuck at end of this output and loop indefinitly :
The output of htop give the task borgmatic at 100% of cpu with mysqldump task stuck at with no time CPU used.
Same like this issue : #397
Thanks for including the logs. I don't see anything obvious there, but I find the
terminating with success status, rc 0
pretty strange. That indicates that Borg has exited. But presumablymysqldump
is still running! Normally Borg is responsible for consuming the special file (named pipe) that mysqldump is writing to, so it Borg shouldn't exit first. But it's possible there is a bug or timing issue.Here are some things to try:
find
command from above, but instead of running it on your source directories, run it on your~/.borgmatic
directory. If it lists any special files, that could indicate a problem.~/.borgmatic/*databases
if it's present. If it is, there could be some special files left behind that are causing the hang. There shouldn't be, though.mysqldump
command that borgmatic is running to make sure it works as expected on its own:mysqldump -v --single-transaction --compress --order-by-primary --no-create-db --add-drop-database --host 127.0.0.1 --port 40001 --protocol tcp --user HIDDEN_STACK_NAME --databases HIDDEN_STACK_NAME
. Note that you may have to enter a password manually or set theMYSQL_PWD
environment variable. If it hangs or otherwise doesn't work, that might be causing this problem.Let me know how this goes! And thanks for your patience.
The find return nothings.
There are no folders naming *databases in .borgmatic root folder.
The mysqldump command work greate. Tested multiple times.
I running the task each hour for now. I suspect that my backup solution (active backup for business, Synology) running also each hour create some bug (it use snapshot to do the backup)
So I moved the borgmatic task 30min after each hour and right now, borgmatic run great.
Did you think that can be the problem ?
Yeah, I think that's actually a good candidate for a contributor to this problem. I don't know how Active Backup Business works under the hood, but if it's creating special files as part of its operations, Borg could be hanging on them. It's also possible that the high CPU load stemming from Active Backup + Borg running at the same time is triggering a timing bug in borgmatic. Finally, it's possible although unlikely you've got a disk with failing sectors such that high I/O causes hangs! It may not be the worst idea to do a disk scan / self-test just to eliminate that as a potential cause. My recommendation in general though would be to try to schedule borgmatic to run when other backup software isn't running.
Ok thank you.
This is really strange that only backup files with borgmatic work correctly. It's just with mysqldump. I understand that borgmatic use read_specials and exclude automaticly /dev /run, etc ...
I read syslog and Synology create snapshot on /dev/synosnap0 and /dev/synosnap1
So, I've to exlcude this to borgmatic. How can I do that ? I try to read the documentation but not found the correct answer.
So that would just be a standard glob exclude in
exclude_patterns
. Something like this:However, that should only be necessary if one of your
source_directories
includes/dev
explicitly or implicity, for instance if/
is in your list of source directories. And even if that were the case, borgmatic's auto-excluding should take care of excluding special files as long as they aren't created while borgmatic is running.Hi @witten
Tk again for your support!
Ok so normally, those mount files will not cause problem with specials files. Do you think it can be the snapshot ? Active Backup use snapshot using synosnap package with CBT tracking.
I try to test this exclude_patterns and let you known if it's ok.
Exclude those partitions seems to resolved the issue. If you tell me that borgmatic exclude automaticly /dev/*, why need to manualy add it to prevent borgmatic hang with specials files ? I don't understand...
Ok, so it doesn't work. The probleam appear again.
Also appear in another folders.
@witten , did you have some ideas regarding this problem ?
Tk!
It seems that when borgmatic hang, the folder in /root/.borgmatic/mysql_databases is not created. Can be an idea.
I try to start the backup not at the same time that another backup solution but not work at all.
I'm not familiar with the mechanics of how that package works, but I guess in theory it could interfere. Do you know if it uses a particular filesystem's snapshotting capabilities? btrfs for instance?
borgmatic should automatically exclude special files in
/dev
, but the auto-exclusion logic isn't perfect. For instance, borgmatic scans for special files, adds them to the excludes list, and then invokes Borg with those excludes. But if you've got a simultaneous process running on your machine that creates new special files while that's happening, borgmatic won't know to exclude them because it'll have already done its scan. Therefore it may be more reliable to exclude all of/dev
—assuming that you don't need any of it backed up.How are you determining that? By looking in
/root/.borgmatic
when borgmatic is hung? I wonder how that's possible though, because based on the log you posted above, themysqldump
command is run, and that command is writing to a pipe within/root/.borgmatic/mysql_databases
...So to clarify, you've tried starting borgmatic when another backup solution is not running.. And do you still get the borgmatic hang? Or does it only occur when the other backup solutions runs?
How are you determining what folders that is causing borgmatic to hang? Are you running it with the
--files
flag and looking at the output?At this point my recommendation would be to start with a very small set of files to backup—not your entirely filesystem. And try adding in additional files and directories until you experience the hang. That should help you narrow down the problem hopefully. Also make use of the
--files
flag to help pin down the problem area. However that may not tell you the exact problem file.. it may just tell you one nearby.Thanks for your patience here!
Hi @witten
Tk for your answer.
My backup solution use snapshot of each partition and send to my Synology NAS directly. But this is not the problem because when running borgmatic when Synology backup is in standby, the problem appear again.
I not backup my entire machine with borgmatic, just set the directory of my www folder website.
And yes, while borgmatic hang, the mysql_databases folder is not created and I don't known why.
I'm running mariadb v10.11 on my 2 databases want to backup, can be a problem related of this ?
At the moment, I exclude /dev/* on exclude_paterns to test it.
A few thoughts:
/dev
as it's presumably not reachable from your www folder.source_directories
option entirely and see if the hang still occurs. If it does, that indicates it's entirely on the database hook side. If the hang goes away, then it's got to be something in your remaining source directories.mysqldump
appear in your process list (ps xua | grep mysqldump
)? Does Borg (ps xua |grep borg
)?Yes, mysqldump appear and no responce (no cpu usage)
Yes I can but sometimes, it work.
This is the output while hang :
Ok, so maybe I found the problem
I'm running 2 borgmatic at the same time and I found that if I run each separate, it work but at the same time, one of task hang.
So, borgmatic have a limitation of running 2 mysql_databases hook at the same running time task ?
Good find! That's almost certainly the cause of the hang here. And yes, borgmatic does have a limitation in that there is a single named pipe per database (used to send dump data from MariaDB to Borg), so two instances of borgmatic will conflict and one will hang if run simultaneously.
In terms of solving your immediate problem, was borgmatic running twice because it was scheduled to run so frequently? For instance, maybe the previous hour's borgmatic instance was still running when the next hour's borgmatic job triggered? If that's the case, can you space out the jobs more or otherwise prevent borgmatic from running twice?
For a longer term solution, it sounds like it would be useful for borgmatic to detect this situation and error cleanly instead of just hanging?
Let me know your thoughts. Thanks!
Arrrrrr, it hang again with separate time backup. I don't have anothers ideas :(
And you're absolutely sure the previous borgmatic instance had exited completely (such that
~/.borgmatic/mysql_databases
is empty) before the second one had started?I looked at your traceback BTW and there's unfortunately nothing definitive there. It just means that during the hang, borgmatic is waiting for output from MariaDB and/or Borg before proceeding.
Yes i'm sure.
Maybe can be mariaDB don't return the result code while borgmatic execute the task?
borgmatic basically waits until mysqldump exits, so it should get a result code eventually unless mysqldump itself is hanging for some reason. One other thing you can try is to comment out your database options in borgmatic's configuration, in case that's causing problems. I'm talking about commenting out this line in particular:
Hi @witten , I've done a little change. Move hour execution to execute 2 tasks with 20min between each and kill all process hange (still have yesterday).
It work like a charm since 01AM.
So I can confirm that is a limitation of two paralell task ?
How can I start 2 task at the same time ?
Yes, that is a limitation. borgmatic cannot (currently) run with multiple parallel instances, and in fact Borg doesn't allow that either if you're operating on a single repository. My question though is: Why do you need to run two instances at the same time? Is it because you're passing different command-line arguments to each instance? Or is it in order to run multiple configuration files? Are you aware that with borgmatic you can do that for you with a single invocation? For example:
That will run through the backups for both borgmatic configuration files. The downside is they're not run in parallel, but rather sequentially.
Similarly, if both config files are in the same directory, you can even just do this:
And borgmatic will discover the config files in that directory.
Or do you want to do this strictly for performance reasons, and parallelism would be the easiest way to achieve that? If so, you may be interested in this (closed) ticket: #227. But I would be interested in hearing more about your parallelism use case.
Thanks!
Hi @witten
I doing this because I use multiple stack container and for restauration, I would like to simplify it by using one yaml configuration per website (I've managed 100+ website).
And also, for prevent long time task.
So yes, i'm interessting but I don't kown why implement this.
For the first issue, multiple stacks with separate configuration files, the aforementioned single-borgmatic-invocation approach should work great. More here: https://torsion.org/borgmatic/docs/how-to/make-per-application-backups/
For the second issue, preventing a long running time via parallelism, borgmatic doesn't currently solve that. But I'll ask: Do you use a single repository for all of these 100+ websites? Because if so, not even Borg supports that kind of parallelism; there's a single lock for the whole repository. Or are you using a separate Borg repository per website?
Thanks!
Hi @witten
Yes, it could be an approach but if the command fail, all others website are'll not backuped. It's for the reason that we have separate conf file for each website.
And, we use 1 repo per website and not all in the repo so this is not a problem for it.
To resume, the backup hang when using multiple borgmatic parallel instance. The only thing that can for this, is to using the link that you' ve describe before to make backup per application.
Hi @witten
So what can we conclude about this issue ?
Tk a lot!
What do you think of a change to borgmatic's default backup behavior such that a single failing configuration file (website) won't prevent other configuration files from getting run? In that situation, I'd still suggest that borgmatic should exit with an error status, but at least then your other websites would get backed up. This should allow you to backup all of your websites with a single borgmatic configuration, and then parallel borgmatic runs would no longer be an issue. Let me know your thoughts.
HI @witten
Yes, it could be cool.
With a specific code error, we'll can customize script to let the task is ok but some backup are failed.
So with the error output, return the configuration file should be nice.
For paralleling task, why borgmatic can't do that ? It's very complicated or need to refactoring all the code ?
Okay, then I'll consider the work in this ticket to make borgmatic not fail if a single configuration fails (either optionally or as default behavior). I'm not sure about the error code or error output though. Historically borgmatic hasn't returned particular computer-readable output as part of its run—with perhaps the exception of
--json
mode. So you might want to look into using one of the monitoring hooks to track the results of a particular borgmatic run.As for borgmatic parallelism, yes, it would be a pretty difficult change given how the code works today. That doesn't mean it's not possible. It just means that other, easier work has taken higher priority. Pull requests welcome. 😄 Related ticket: #227.
It looks like this already works in the most recent version of borgmatic; if a single configuration file fails, the other configuration files continue to run and then the error from the failing file is displayed at the end. So my recommendation is to upgrade borgmatic and see if it works the way you expect. If not, I'd be happy to reopen this. Thanks!