borgmatic fails to unmount and destory zfs snapshot #1295
Labels
No labels
blocked
breaking
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
borgmatic-collective/borgmatic#1295
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I have a configuration that backups the following paths...
/warehouse/music (zpool=rpool/DATA/music)
/warehouse/images (zpool=rpool/DATA/images)
/warehouse/videos (zpool=dpool1/DATA/videos)
/warehouse/documents (zpool=dpool1/DATA/documents)
Backup process starts just fine, snapshots are created for zfs datasets, snapshots are mounted, backups are created and then all snapshots are unmounted and destroyed with the exception of /warehouse/music. The error reported is that it can't destroy the snapshot because it is busy. From what I can tell it is because it never actually got unmounted, though if I manually umount and destroy the snapshot I can do so without any issues. Not sure where to look or how to fix.
Steps to reproduce
No response
Actual behavior
No response
Expected behavior
all zfs snapshots are unmounted and all snapshots are destroyed
Other notes / implementation ideas
No response
borgmatic version
No response
borgmatic installation method
No response
Borg version
2.0.7
Python version
Python 3.13.12
Database version (if applicable)
No response
Operating system and version
NAME='Gentoo' ID='gentoo' PRETTY_NAME='Gentoo Linux' VERSION='2.18' VERSION_ID='2.18' HOME_URL='https://www.gentoo.org/' SUPPORT_URL='https://www.gentoo.org/support/' BUG_REPORT_URL='https://bugs.gentoo.org/' ANSI_COLOR='1;32'
borgmatic fails to unmount zfs snapshot and destory snapshotto borgmatic fails to unmount and destory zfs snapshotA few thoughts on this one:
@witten wrote in #1295 (comment):
I am using version 2.0.7, gentoo does have version 2.1.4 available (masked still though) so I could try and unmask upgrade and see that version fixes my issues. Or attempt to try the stand alone binary.
Command I am using: borgmatic -c /etc/borgmatic.d/dpool1.yaml
log from the backup job uploaded as well as the configuration
Thanks for your assistance with this.
Thanks for providing those details. Yeah, it looks like for
/warehouse/music, borgmatic is attempting an unmount of the snapshot, but it's doing the unmount with the wrong path and therefore the unmount fails. Which then leads to the error around destroying the snapshot, as you discovered.So if you can, please try an upgrade and then running borgmatic again. If the problem doesn't repro, great. But if it does repro, then please post an updated log with that version (and let me know what version it is). That should at least eliminate a number of previously fixed ZFS issues. Thanks!
yup, I see where in the log you noticed that it seems to be referencing an incorrect mount path when it tries unmounting that one snapshot...weird that it only has an issues with this one mounted snapshot and not the others.
I upgraded to version 2.1.4 and tried again...no luck, exact same issue. Log attached.
Thanks for testing that out with the new version and including the log as well. At least now we've eliminated a number of previously fixed issues and can start "fresh." I'll dig into this and see if I can get a repro locally or at minimum an explanation of what might be going on with your machine.
As a test, I created a bogus text file in the /warehouse/music directory as the only difference between this backup directory and the others is that it is an empty directory. I am currently rerunning the backup to see anything different happens by chance. I did briefly look over zfs.py and did not seeing anything that stood out that would explain the incorrect path being used in the umount command but I am also not familiar with the borgmatic code base at all...
Oh! So when borgmatic was run to produce that most recent log,
/warehouse/musicwas a completely empty directory? If so, I think that might be triggering this code inzfs.py, which short-circuits unmounting—and could explain what you're seeing:ha I did see that piece of code and stopped on it for a bit as to suspect it might be the case but quickly decided it wasn't as its not a shadow of a nested directory. Given the fact that I have a reasonable use case of backing up a directory that is at least at the moment empty do you see a way to accommodate this in the code base? As for a workaround I can simply just keep an empty file in the directory for now. I will confirm when the backup job completes if it fails or succeeds with the empty file there.
Yeah, this code is basically using a pretty bad heuristic: This snapshot mount path is an empty directory, and therefore it's probably a shadow of a nested dataset within the snapshot for a parent dataset, and therefore its unmount should be skipped. But I suspect that this code also happens to trigger for plain old datasets that happen to be empty, such as
/warehouse/musicin your case.The underlying problem is that there's no good cross-platform way to probe for whether a directory is mounted, so you can see the code in this function sort of dancing around that by doing several tangentially related checks before the actual unmount. That includes this empty directory check.
Anyway, I'll have to think about whether I can modify this to accommodate your use case without breaking the nested dataset shadow use case.
And yes, as a workaround for now you should be able to put an empty file in the otherwise empty dataset directory.
I believe this subsequent backup run with the bogus text file worked as far as the unmounting and destroying of the snapshot but I encountered another error and ultimately the backup still failed. Note sure if this new error is related and if from borgmatic or borg itself....
I can confirm now that running the backup job after ensuring the problematic backup path is not empty does in fact successfully complete now. So for now I will keep this empty file in the directory to workaround this issue. The error I encountered in my last comment was an I/O issue raised by borg itself that seems to be from a corrupt backup repository which I ended just recreating the repository to fix. Just need to figure out a smarter way to import/export my zfs pool that I specifically use for the backup repositories so that concurrent jobs do not try to export/unmount while the other is still using...
If you do make any enhancements to the zfs handling with respect to what I experienced here I would be more than willing to do some testing.
Glad to hear that you managed to get around the Borg I/O issue with a repo recreate.
I don't know if it's helpful here, but there is this: https://borgbackup.readthedocs.io/en/stable/usage/lock.html
Great, I'll let you know!
Okay, I believe this should be fixed in main! It'll be part of the next release. If you have an easy way to test this, please feel free and let me know how it works for you. The new
zfs.pyshould be a drop-in replacement on top of thezfs.pyin borgmatic 2.1.4.I can confirm the fixes you made do not raise the errors now when a zfs dataset contains no files/directories and that the snapshot created against such dataset gets properly destroyed as well. Thanks for your efforts on this!
As a side note for others, below is what I ended up doing regarding dynamically importing (mounting)/exporting (unmounting) the 'backups' zfs pool prior to running backups that use repositories on that pool... This snippet of yaml is in a separate yaml file that is then included in any other configuration files that use this pool and the repositories that reside on it.
Note: I only need to worry about importing/exporting the pool and not any actual mounting/unmounting as the backup dataset is configured to auto mount/unmount when the pool is imported or exported.
Thanks for reporting back.. I'm glad to hear the fixes are working for you! And also that you've got the dynamic import/export integrated as well.
Released in borgmatic 2.1.5!