Add a locking mechanism such that only one borgmatic instance can run simultaneously for a config #250
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#250
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I've got pre/post backup scripts that run which do some things like take snapshots of filesystems/vms. Currently they are set up in a way where if two instances of borgmatic run simultaneously they will fight each other and then an inconsistent backup would be taken. My current solution to this is to create a script which wraps borgmatic and uses flock to take a lock on the filesystem which prevents a second borgmatic instance from running. The script is shown below. I renamed borgmatic to _borgmatic so that I don't accidentally run it. (It would be cleaner to just named the script something else and just use a non default config file, but this works fine for now)
It would be cleaner if you borgmatic could just grab a lock itself for the length of its execution.
Other notes / implementation ideas
Python does have a flock module for filesystem locking. The nice thing about locks created with that mechanism is they disappear when the program closes. I believe its as simple as doing something like this.
I'm not sure if you'd want to use the config file as the file to lock. The one thing I remember was that when the handle to the file gets garbage collected (
pidfile
in this case) the lock get dropped, so it needs to be to a file handle that is open for the length of the program.If Windows needs to be supported, I'd imagine there is something similar, but I'm not sure what
Thank you for filing this and providing implementation ideas! Super helpful. Given that Borg only sort of supports Windows, it'd probably be okay to start this feature off as Unix-only.
So with the Python flock approach, there's no risk of a "stale" lock getting left behind, e.g. if borgmatic errors? And so we wouldn't need a remove stale lock feature?
Any thoughts on how this feature should interact with the traditional server-side Borg with-lock feature? Just support both, and make both into borgmatic configuration file options? Or are you thinking that the flock locking would be always-on and automatic?
Also including this quote from IRC in case someone sees this ticket and wonders why the server-side Borg with-lock feature isn't enough:
My understanding is the OS handles the dropping of the locks when the the file handle is closed either by the program or the OS if the process dies. I've used this technique before and never ran into problems, but it doesn't mean there aren't edge cases to worry about.
As long as the file is on a filesystem that supports these locks, which I think is a reasonable expectation, I don't see the need to implement a separate locking mechanism that could have stale locks. (At least that's my understanding)
As I mentioned to you in IRC, the reason I wanted this instead of the borg with-lock feature is that that puts the lock on the remote system, which isn't helpful if a single borgmatic config backs up to multiple destinations (and pre/post backup scripts are used). The borg with-lock I'd imagine is more useful if you have multiple machines backing up to the same borg repository though that is just a guess since I've never used that feature. I don't immediately see the harm in just leaving them as separate features.
I guess the downside to having it on by default is if you have multiple destinations it doesn't let you use borg to interact with one destination while the other one is running. Another downside is if the config file or what not is stored on a filesystem that doesn't support this locking mechanism, you would then have to kick out a warning or something. I'd be inclined to have it just be an additional option, since its only needed if you use backup scripts that have side effects I believe.
Cool. That makes it easier.
That makes me think of another edge case. Would you expect typically interactive commands like
borgmatic list
,borgmatic info
, etc. to take out the lock, just likeborgmatic create
would?I thought about that as well. You can't run
borg list
whenborg create
is running soborg
must be using a lock as well forborg list
.borg
may be using shared locks instead of exclusive ones forlist
andinfo
which means that those commands could run in parallel to each other but not whilecreate
orprune
is running. I think its probably fine to just take a lock on everything though since I don't think having list/info working every time that it possibly can when this option is enabled is so critical.Okay, that also makes it easier! Thanks.
witten referenced this issue2019-12-03 17:58:47 +00:00
FYI there's an ongoing discussion on the granularity of locking on the PR for this: witten/borgmatic#254
Anyone who's interested: Feel free to weigh in!
Closing due to inactivity. However, please feel free to reopen if you're still interested in this!