Borgmatic to manage parallel support? #227
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#227
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Backups take a while but if you have a powerful machine, you could run them in parallel! This is possible to manage externally from borgmatic but it would also be nice to have a
-n N
flag to signal some cores and have borgmatic run what it can in parallel. The user would have to make sure that what is being backed up can be done in parallel.This is probably a can of worms but would be nice to think through.
Interesting. What might you want to be performed in parallel, exactly? Separate Borg invocations, say if you're backing up to two separate repositories on different remote hosts? Is the goal here to improve performance to reduce total runtime? If so, why? (I can imagine why, but I'm interested in your thinking here.)
Do you have reason to believe that running Borg invocations in parallel would actually improve performance? Put another way, what's the bottleneck? Is it disk I/O, or network I/O, or CPU (you mentioned cores)? And would layering on a parallel run help?
You don't need answers to all of these now, but this is the sort of exploration that would help define/scope this feature.
Related ticket: witten/borgmatic#291
Closing due to inactivity. However, please feel free to reopen if you're still interested in this!
I think one of possible scenarios is:
multiple slow remote repositories (either the repository network or its storage).
One may use blazing fast SSDs for production server and use several last decade servers with 5400rpm laptop HDD found on the bottom of the rack to use as borg repositories. The production server will still be far from any kind of bottleneck while the repository server will be choking trying to write all data received.
Those are great points. If anyone has scenarios like those—or others—feel free to chime in here so we can gauge interest.
Multiple remote repositories, where the bottleneck is their bandwidth, is the most obvious use case. For example, borgbase has datacenters in both the US and EU. If, like me, you understand the value of both having redundant backups, and of spreading your backups across multiple geolocations, you'll create two repos for every one you actually intend to use. Sure you could just use something like rsync to copy a repo from one place to another, but the official borg docs have some reasoning why they don't recommend you do that and I will take them at their word for it. Nonetheless, I'm writing the same exact data to two different repos, and doing them in sequence (as borgmatic wants to do) means more than doubling the amount of time spent on the backup.
Now, I think (but am not sure, as I've not tried yet) that it's possible to run
borg create
targeting the same target directory in concurrent processes; if it is, then it seems to me that borgmatic would benefit greatly from having this be part of its declarative config rather than have that be something you have to orchestrate yourself (which has the extra pain of forcing you to specify YAML overrides.)Another thing that having borgmatic take care of, in a parallel world, would go a long way to help is the
extract
integrity check: I reckonborg extract
isn't safe to run concurrently when targeting the same directory, so having borgmatic take care of this logic internally by allowing all the integrity checks except for theextract
dry-run to run in parallel, and then only running thatextract
dry-run for each repository in series, would be a huge boon to speed while also protecting users from potentially shooting themselves in the foot. I'm pretty sure you just need to make sure that anotherborg create
isn't still going when you get to the point where you want to run theextract
dry-run, and again, having borgmatic take care of this instead of orchestrating it yourself would be a huge help.There's also the matter of using hooks correctly to prevent running backups on live data. Right now I have my
before
hooks touch a file and myafter
hooksrm
it. If I were to run separateborgmatic
processes at once, I'd have race conditions unless I have YAML overrides for those hook settings too (and have my app check globs before it starts.) But if borgmatic were to take care of the parallelism itself, we could have things likebefore_any_backup
(run this command before any backup happens, and only once, e.g.touch
a file) andafter_all_backup
(run this command after all backups are done, e.g.rm
a file) without users having to do external orchestration or a ton of YAML overrides/config duplication.Of course, I do understand this will complicate logging. Depending on how much effort you want to put in, you could either go all-in and write code to properly handle concurrent logs nicely, something like in this blog post, or you could just let chaos reign and tell the user that if they engage the parallelism then their logs will be interspersed and potentially garbled. Caveat emptor. I personally would accept log garbling as a tradeoff for speed.