#250 Add a locking mechanism such that only one borgmatic instance can run simultaneously for a config

Open
opened 11 months ago by drewkett · 6 comments

What I’m trying to do and why

I’ve got pre/post backup scripts that run which do some things like take snapshots of filesystems/vms. Currently they are set up in a way where if two instances of borgmatic run simultaneously they will fight each other and then an inconsistent backup would be taken. My current solution to this is to create a script which wraps borgmatic and uses flock to take a lock on the filesystem which prevents a second borgmatic instance from running. The script is shown below. I renamed borgmatic to _borgmatic so that I don’t accidentally run it. (It would be cleaner to just named the script something else and just use a non default config file, but this works fine for now)

#!/bin/sh
                                                                                                                                                                                                                                             lockpath=/.backups
flock -xn -E201 $lockpath /root/.local/bin/_borgmatic "$@"
if [ $? -eq 201 ]; then
    echo "Failed to acquire lock for $lockpath"
    exit 201
fi

It would be cleaner if you borgmatic could just grab a lock itself for the length of its execution.

Other notes / implementation ideas

Python does have a flock module for filesystem locking. The nice thing about locks created with that mechanism is they disappear when the program closes. I believe its as simple as doing something like this.

pidfile = open("/etc/borgmatic/config.yaml", "r")
try:
    fcntl.flock(self.pidfile.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
     self.pidfile = None
     raise SystemExit("Already running according to " + self.path)

I’m not sure if you’d want to use the config file as the file to lock. The one thing I remember was that when the handle to the file gets garbage collected (pidfile in this case) the lock get dropped, so it needs to be to a file handle that is open for the length of the program.

If Windows needs to be supported, I’d imagine there is something similar, but I’m not sure what

#### What I'm trying to do and why I've got pre/post backup scripts that run which do some things like take snapshots of filesystems/vms. Currently they are set up in a way where if two instances of borgmatic run simultaneously they will fight each other and then an inconsistent backup would be taken. My current solution to this is to create a script which wraps borgmatic and uses flock to take a lock on the filesystem which prevents a second borgmatic instance from running. The script is shown below. I renamed borgmatic to _borgmatic so that I don't accidentally run it. (It would be cleaner to just named the script something else and just use a non default config file, but this works fine for now) ``` #!/bin/sh lockpath=/.backups flock -xn -E201 $lockpath /root/.local/bin/_borgmatic "$@" if [ $? -eq 201 ]; then echo "Failed to acquire lock for $lockpath" exit 201 fi ``` It would be cleaner if you borgmatic could just grab a lock itself for the length of its execution. #### Other notes / implementation ideas Python does have a flock module for filesystem locking. The nice thing about locks created with that mechanism is they disappear when the program closes. I believe its as simple as doing something like this. ``` pidfile = open("/etc/borgmatic/config.yaml", "r") try: fcntl.flock(self.pidfile.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB) except IOError: self.pidfile = None raise SystemExit("Already running according to " + self.path) ``` I'm not sure if you'd want to use the config file as the file to lock. The one thing I remember was that when the handle to the file gets garbage collected (`pidfile` in this case) the lock get dropped, so it needs to be to a file handle that is open for the length of the program. If Windows needs to be supported, I'd imagine there is something similar, but I'm not sure what
witten commented 11 months ago
Owner

Thank you for filing this and providing implementation ideas! Super helpful. Given that Borg only sort of supports Windows, it’d probably be okay to start this feature off as Unix-only.

So with the Python flock approach, there’s no risk of a “stale” lock getting left behind, e.g. if borgmatic errors? And so we wouldn’t need a remove stale lock feature?

Any thoughts on how this feature should interact with the traditional server-side Borg with-lock feature? Just support both, and make both into borgmatic configuration file options? Or are you thinking that the flock locking would be always-on and automatic?

Also including this quote from IRC in case someone sees this ticket and wonders why the server-side Borg with-lock feature isn’t enough:

The issue with the borg lock command for me is that I really want the lock to happen locally since I back up to two locations and the prebackup scripts only happens once

Thank you for filing this and providing implementation ideas! Super helpful. Given that Borg only sort of supports Windows, it'd probably be okay to start this feature off as Unix-only. So with the Python flock approach, there's no risk of a "stale" lock getting left behind, e.g. if borgmatic errors? And so we wouldn't need a remove stale lock feature? Any thoughts on how this feature should interact with the traditional server-side [Borg with-lock feature](https://borgbackup.readthedocs.io/en/stable/usage/lock.html)? Just support both, and make both into borgmatic configuration file options? Or are you thinking that the flock locking would be always-on and automatic? Also including this quote from IRC in case someone sees this ticket and wonders why the server-side Borg with-lock feature isn't enough: > The issue with the borg lock command for me is that I really want the lock to happen locally since I back up to two locations and the prebackup scripts only happens once
drewkett commented 11 months ago
Poster

My understanding is the OS handles the dropping of the locks when the the file handle is closed either by the program or the OS if the process dies. I’ve used this technique before and never ran into problems, but it doesn’t mean there aren’t edge cases to worry about.

As long as the file is on a filesystem that supports these locks, which I think is a reasonable expectation, I don’t see the need to implement a separate locking mechanism that could have stale locks. (At least that’s my understanding)

As I mentioned to you in IRC, the reason I wanted this instead of the borg with-lock feature is that that puts the lock on the remote system, which isn’t helpful if a single borgmatic config backs up to multiple destinations (and pre/post backup scripts are used). The borg with-lock I’d imagine is more useful if you have multiple machines backing up to the same borg repository though that is just a guess since I’ve never used that feature. I don’t immediately see the harm in just leaving them as separate features.

I guess the downside to having it on by default is if you have multiple destinations it doesn’t let you use borg to interact with one destination while the other one is running. Another downside is if the config file or what not is stored on a filesystem that doesn’t support this locking mechanism, you would then have to kick out a warning or something. I’d be inclined to have it just be an additional option, since its only needed if you use backup scripts that have side effects I believe.

My understanding is the OS handles the dropping of the locks when the the file handle is closed either by the program or the OS if the process dies. I've used this technique before and never ran into problems, but it doesn't mean there aren't edge cases to worry about. As long as the file is on a filesystem that supports these locks, which I think is a reasonable expectation, I don't see the need to implement a separate locking mechanism that could have stale locks. (At least that's my understanding) As I mentioned to you in IRC, the reason I wanted this instead of the borg with-lock feature is that that puts the lock on the remote system, which isn't helpful if a single borgmatic config backs up to multiple destinations (and pre/post backup scripts are used). The borg with-lock I'd imagine is more useful if you have multiple machines backing up to the same borg repository though that is just a guess since I've never used that feature. I don't immediately see the harm in just leaving them as separate features. I guess the downside to having it on by default is if you have multiple destinations it doesn't let you use borg to interact with one destination while the other one is running. Another downside is if the config file or what not is stored on a filesystem that doesn't support this locking mechanism, you would then have to kick out a warning or something. I'd be inclined to have it just be an additional option, since its only needed if you use backup scripts that have side effects I believe.
witten commented 11 months ago
Owner

As long as the file is on a filesystem that supports these locks, which I think is a reasonable expectation, I don’t see the need to implement a separate locking mechanism that could have stale locks. (At least that’s my understanding)

Cool. That makes it easier.

I guess the downside to having it on by default is if you have multiple destinations it doesn’t let you use borg to interact with one destination while the other one is running.

That makes me think of another edge case. Would you expect typically interactive commands like borgmatic list, borgmatic info, etc. to take out the lock, just like borgmatic create would?

> As long as the file is on a filesystem that supports these locks, which I think is a reasonable expectation, I don’t see the need to implement a separate locking mechanism that could have stale locks. (At least that’s my understanding) Cool. That makes it easier. > I guess the downside to having it on by default is if you have multiple destinations it doesn’t let you use borg to interact with one destination while the other one is running. That makes me think of another edge case. Would you expect typically interactive commands like `borgmatic list`, `borgmatic info`, etc. to take out the lock, just like `borgmatic create` would?
drewkett commented 11 months ago
Poster

I thought about that as well. You can’t run borg list when borg create is running so borg must be using a lock as well for borg list. borg may be using shared locks instead of exclusive ones for list and info which means that those commands could run in parallel to each other but not while create or prune is running. I think its probably fine to just take a lock on everything though since I don’t think having list/info working every time that it possibly can when this option is enabled is so critical.

I thought about that as well. You can't run `borg list` when `borg create` is running so `borg` must be using a lock as well for `borg list`. `borg` may be using shared locks instead of exclusive ones for `list` and `info` which means that those commands could run in parallel to each other but not while `create` or `prune` is running. I think its probably fine to just take a lock on everything though since I don't think having list/info working every time that it possibly can when this option is enabled is so critical.
witten commented 11 months ago
Owner

Okay, that also makes it easier! Thanks.

Okay, that also makes it easier! Thanks.
witten commented 10 months ago
Owner

FYI there’s an ongoing discussion on the granularity of locking on the PR for this: #254

Anyone who’s interested: Feel free to weigh in!

FYI there's an ongoing discussion on the granularity of locking on the PR for this: https://projects.torsion.org/witten/borgmatic/pulls/254 Anyone who's interested: Feel free to weigh in!
witten added the
waiting for response
label 10 months ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.