Borgmatic to manage parallel support? #227

New Issue

decentral1se · 2019-10-20T11:05:40Z

decentral1se commented

2019-10-20 11:05:40 +00:00

What I'm trying to do and why

Backups take a while but if you have a powerful machine, you could run them in parallel! This is possible to manage externally from borgmatic but it would also be nice to have a -n N flag to signal some cores and have borgmatic run what it can in parallel. The user would have to make sure that what is being backed up can be done in parallel.

This is probably a can of worms but would be nice to think through.

#### What I'm trying to do and why Backups take a while but if you have a powerful machine, you could run them in parallel! This is possible to manage externally from borgmatic but it would also be nice to have a `-n N` flag to signal some cores and have borgmatic run what it can in parallel. The user would have to make sure that what is being backed up can be done in parallel. This is probably a can of worms but would be nice to think through.

witten commented

2019-10-21 00:09:47 +00:00

Interesting. What might you want to be performed in parallel, exactly? Separate Borg invocations, say if you're backing up to two separate repositories on different remote hosts? Is the goal here to improve performance to reduce total runtime? If so, why? (I can imagine why, but I'm interested in your thinking here.)

Do you have reason to believe that running Borg invocations in parallel would actually improve performance? Put another way, what's the bottleneck? Is it disk I/O, or network I/O, or CPU (you mentioned cores)? And would layering on a parallel run help?

You don't need answers to all of these now, but this is the sort of exploration that would help define/scope this feature.

Interesting. What might you want to be performed in parallel, exactly? Separate Borg invocations, say if you're backing up to two separate repositories on different remote hosts? Is the goal here to improve performance to reduce total runtime? If so, why? (I can imagine why, but I'm interested in your thinking here.) Do you have reason to believe that running Borg invocations in parallel would actually improve performance? Put another way, what's the bottleneck? Is it disk I/O, or network I/O, or CPU (you mentioned cores)? And would layering on a parallel run help? You don't need answers to all of these now, but this is the sort of exploration that would help define/scope this feature.

witten added the

waiting for response

label 2019-11-18 01:26:36 +00:00

witten referenced this issue

2020-02-03 17:46:00 +00:00

Running checks on a repo asynchronously from backup to another repo? #291

witten commented

2020-02-03 20:12:40 +00:00

Related ticket: witten/borgmatic#291

Related ticket: https://projects.torsion.org/witten/borgmatic/issues/291

witten commented

2021-08-31 23:42:42 +00:00

Closing due to inactivity. However, please feel free to reopen if you're still interested in this!

witten closed this issue

2021-08-31 23:42:42 +00:00

witten removed the

waiting for response

label 2021-08-31 23:42:49 +00:00

rootsin commented

2022-02-07 13:52:08 +00:00

I think one of possible scenarios is:
multiple slow remote repositories (either the repository network or its storage).

One may use blazing fast SSDs for production server and use several last decade servers with 5400rpm laptop HDD found on the bottom of the rack to use as borg repositories. The production server will still be far from any kind of bottleneck while the repository server will be choking trying to write all data received.

I think one of possible scenarios is: multiple slow remote repositories (either the repository network or its storage). One may use blazing fast SSDs for production server and use several last decade servers with 5400rpm laptop HDD found on the bottom of the rack to use as borg repositories. The production server will still be far from any kind of bottleneck while the repository server will be choking trying to write all data received.

witten commented

2022-02-09 17:57:29 +00:00

Those are great points. If anyone has scenarios like those—or others—feel free to chime in here so we can gauge interest.

2rs2ts commented

2022-12-01 04:46:52 +00:00

Multiple remote repositories, where the bottleneck is their bandwidth, is the most obvious use case. For example, borgbase has datacenters in both the US and EU. If, like me, you understand the value of both having redundant backups, and of spreading your backups across multiple geolocations, you'll create two repos for every one you actually intend to use. Sure you could just use something like rsync to copy a repo from one place to another, but the official borg docs have some reasoning why they don't recommend you do that and I will take them at their word for it. Nonetheless, I'm writing the same exact data to two different repos, and doing them in sequence (as borgmatic wants to do) means more than doubling the amount of time spent on the backup.

Now, I think (but am not sure, as I've not tried yet) that it's possible to run borg create targeting the same target directory in concurrent processes; if it is, then it seems to me that borgmatic would benefit greatly from having this be part of its declarative config rather than have that be something you have to orchestrate yourself (which has the extra pain of forcing you to specify YAML overrides.)

Another thing that having borgmatic take care of, in a parallel world, would go a long way to help is the extract integrity check: I reckon borg extract isn't safe to run concurrently when targeting the same directory, so having borgmatic take care of this logic internally by allowing all the integrity checks except for the extract dry-run to run in parallel, and then only running that extract dry-run for each repository in series, would be a huge boon to speed while also protecting users from potentially shooting themselves in the foot. I'm pretty sure you just need to make sure that another borg create isn't still going when you get to the point where you want to run the extract dry-run, and again, having borgmatic take care of this instead of orchestrating it yourself would be a huge help.

There's also the matter of using hooks correctly to prevent running backups on live data. Right now I have my before hooks touch a file and my after hooks rm it. If I were to run separate borgmatic processes at once, I'd have race conditions unless I have YAML overrides for those hook settings too (and have my app check globs before it starts.) But if borgmatic were to take care of the parallelism itself, we could have things like before_any_backup (run this command before any backup happens, and only once, e.g. touch a file) and after_all_backup (run this command after all backups are done, e.g. rm a file) without users having to do external orchestration or a ton of YAML overrides/config duplication.

Of course, I do understand this will complicate logging. Depending on how much effort you want to put in, you could either go all-in and write code to properly handle concurrent logs nicely, something like in this blog post, or you could just let chaos reign and tell the user that if they engage the parallelism then their logs will be interspersed and potentially garbled. Caveat emptor. I personally would accept log garbling as a tradeoff for speed.

Multiple remote repositories, where the bottleneck is *their* bandwidth, is the most obvious use case. For example, borgbase has datacenters in both the US and EU. If, like me, you understand the value of both having redundant backups, and of spreading your backups across multiple geolocations, you'll create two repos for every one you actually intend to use. Sure you could just use something like rsync to copy a repo from one place to another, but [the official borg docs have some reasoning why they don't recommend you do that](https://borgbackup.readthedocs.io/en/stable/faq.html#can-i-copy-or-synchronize-my-repo-to-another-location) and I will take them at their word for it. Nonetheless, I'm writing the same exact data to two different repos, and doing them in sequence (as borgmatic wants to do) means more than doubling the amount of time spent on the backup. Now, I think (but am not sure, as I've not tried yet) that it's _possible_ to run `borg create` targeting the same target directory in concurrent processes; if it is, then it seems to me that borgmatic would benefit greatly from having this be part of its declarative config rather than have that be something you have to orchestrate yourself (which has the extra pain of forcing you to specify YAML overrides.) Another thing that having borgmatic take care of, in a parallel world, would go a long way to help is the `extract` integrity check: I reckon `borg extract` isn't safe to run concurrently when targeting the same directory, so having borgmatic take care of this logic internally by allowing all the integrity checks _except_ for the `extract` dry-run to run in parallel, and then only running that `extract` dry-run for each repository in series, would be a huge boon to speed while also protecting users from potentially shooting themselves in the foot. I'm pretty sure you just need to make sure that another `borg create` isn't still going when you get to the point where you want to run the `extract` dry-run, and again, having borgmatic take care of this instead of orchestrating it yourself would be a huge help. There's also the matter of using hooks correctly to prevent running backups on live data. Right now I have my `before` hooks touch a file and my `after` hooks `rm` it. If I were to run separate `borgmatic` processes at once, I'd have race conditions unless I have YAML overrides for those hook settings too (and have my app check globs before it starts.) But if borgmatic were to take care of the parallelism itself, we could have things like `before_any_backup` (run this command before any backup happens, and only once, e.g. `touch` a file) and `after_all_backup` (run this command after all backups are done, e.g. `rm` a file) without users having to do external orchestration or a ton of YAML overrides/config duplication. Of course, I do understand this will complicate logging. Depending on how much effort you want to put in, you could either go all-in and write code to properly handle concurrent logs nicely, something like in [this blog post](https://scribe.nixnet.services/python3-logging-with-multiprocessing-f51f460b8778), or you could just let chaos reign and tell the user that if they engage the parallelism then their logs will be interspersed and potentially garbled. Caveat emptor. I personally would accept log garbling as a tradeoff for speed.

witten referenced this issue

2023-09-22 15:55:27 +00:00

Borgmatic hangs while using mysqldump #755

witten referenced this issue

2023-10-05 16:43:22 +00:00

Borgmatic hangs while using mysqldump #755

Sign in to join this conversation.