LVM snapshots fail under systemd #1163
Labels
No labels
blocked
breaking
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
borgmatic-collective/borgmatic#1163
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I'm trying to snapshot two filesystems, /archive and /home, for backup. Both are on LVM logical volumes:
But borgmatic 2.0.7-2.0.9 reports:
Steps to reproduce
Here's the config:
Actual behavior
Here's the log file. The
WARNINGis about halfway down.And here's output of
lsblk, with some irrelevant devices removed:Expected behavior
No response
Other notes / implementation ideas
No response
borgmatic version
2.0.9 (but I've observed this since 2.0.7)
borgmatic installation method
Debian package (locally created)
Borg version
borg 1.2.8
Python version
Python 3.12.3
Database version (if applicable)
No response
Operating system and version
Ubuntu 24.04.3 LTS
Thanks for the detailed ticket! So you're saying this worked fine for you with 2.0.6 and stopped working with 2.0.7? Are you sure it didn't stop working with 2.0.8? EDIT: I see based on your comment on #1150 that it indeed stopped working with 2.0.8.
The error message you're getting (
The runtime directory /run/borgmatic overlaps with the configured excludes or patterns with excludes.) didn't actually exist for the non-database case until borgmatic 2.0.8 (specifically #1122), and it doesn't look like you have databases configured.Also, can I get a look at your
/etc/borgmatic.d/vm.patternsfile? The- **in the patterns output looks pretty suspect, and it could be causing more files to get excluded than you're intending.It's possible that all of this runtime directory stuff is unrelated to the LVM issue you're experiencing, but it would be good to sort both out in case they're related.
Possibly also related: #1150
I just tried adding
- **to mypatterns, and I indeed get an error about my runtime directory overlapping with the configured excludes—likely because- **is telling Borg to exclude absolutely everything!Thanks for your quick response. I'm sorry if I seemed to confuse this issue with #1150. The issue here has nothing to do with the "runtime directory overlaps" error. It happens before that:
My backup roots are /archive and /home, as you can see in vm.patterns:
The issue is that both of those filesystems are mounted on LVM volumes, so borgmatic should be able to snapshot them. But instead it says "No LVM logical volumes found to snapshot".
I'm not sure if this happened before version 2.0.7 - my previous version before that was 1.9.11, and I don't think I was using
lvm:with that.To help simplify this ticket, let me drop back to version 2.0.7 again, which doesn't have #1150. Then I'll post a new clean log from that.
OK. Trying again, still in borgmatic 2.0.9, although I've seen this behavior since 2.0.7. I don't know if it was present before then.
Here is vm.yaml, same as before:
And vm.patterns, adjusted a little to work around #1150:
And here's the log output. On line 32 we still have
WARNING: rsync.net-helium: No LVM logical volumes found to snapshot.Thanks for including the logs from running with the work-around. I can't seem to repro here, even by using similar patterns. Do you feel comfortable replacing one or two of your borgmatic source files with altered versions instrumented to add a bunch of extra debugging logs? That might help us pinpoint the problem. If so, let me know and I can get you those source file(s) with extra logging.
Yes, I’m happy to do that.
Okay, I've attached an instrumented version of one source file with logging added to a single function. If this turns out not to pinpoint the problem, we can expand the logging to additional code. You should be able to drop this directly into the installed borgmatic source, specifically at
borgmatic/hooks/data_source/snapshot.py. Let me know if you have trouble locating it. It sounds like you've built your own Debian package so you could either modify the source for that or just temporarily overwrite the installed copy wherever your package puts it on your system.I moved aside the installed /usr/lib/python3/dist-packages/borgmatic/hooks/data_source/snapshot.py, installed your instrumented version, and checked the diff to be sure the instrumented one is in place. Then I turned syslog_verbosity and logfile_verbosity up to 2. But so far I'm not getting any extra messages in the log files. I've reviewed them and grepped for
1163but there's nothing. I'm scratching my head over this. Any other suggestions?A couple of ideas:
snapshot.pysource file being used. Can you try temporarily deleting it entirely to see whether borgmatic errors on the missing file?lvm.pyto help diagnose what's going on there.Okay, I've attached an instrumented version of
lvm.py, based on borgmatic 2.0.9. (If you're using a different version now, let me know.) You should be able to drop it in to temporarily overwrite the existinglvm.pyand get some more debugging logs. And you can use it with or without the alteredsnapshot.pyfrom before.Thanks. On vacation this week. Will take another look on Monday.
OK, I got a log file with the instrumented lvm.py and verbosity 2. There are lots of DEBUG statements with
#1163in them. The log is below. Note that I have two borgmatic profiles, all.yaml and vm.yaml, but I removed the output from all.yaml here. That profile doesn't include thelvm:directive, and the output is a few thousand lines, so to simplify things I removed that part of the log.I think I already see what the problem is, although I don't know why. In the log lines for
devices_info['blockdevices'], there are no entries for /archive and /home. borgmatic says that it runslsblk --output name,path,mountpoint,type --json --list, and when I run that I see /archive and /home in the list (at the end):But for some reason when borgmatic runs the same command, none of the devices in /dev/mapper are included in its list.
Really interesting! Okay, here's my initial reaction to that: How are you running borgmatic here? Manually or via systemd for instance?
PrivateDevicesoption in the systemd service file, which can interfere with access to/devdevices.lsblkand borgmatic as the same user or different users? One user could have more permissions here that would impact thelsblkoutput.borgmatic is running under systemd. The service file is below, only trivially different from the sample one in the source code, per Ubuntu packaging.
As you can see,
PrivateDevices=yes. Doh. And good news, when I override that tono, borgmatic now sees the device mapper devices.So this is really just an RTFM bug about the warning in the documentation about LVM and systemd. Sorry about that. However, we're not quite done yet because now I got another error creating the snapshot:
So I set
ProtectKernelModules=noand will see how it goes tomorrow. It might be worth another comment in the sample systemd service file.No worries! I'm glad to hear that was it. I think this could be better documented, so I'll take doing that as one of the work items for this ticket.
Sounds like a likely candidate for causing that new error! And yes, this could also be better documented.
One alternate idea though is that if the relevant modules are set to be pre-loaded at boot time, then borgmatic + LVM may work even when
ProtectKernelModules=yes.Still working on this. I set
PrivateDevices=noandProtectKernelModules=noin the systemd service and verified that they're set, and preloaded the dm_snapshot module besides. In the log I still get "version ioctl failed" and "Required device-mapper target(s) not detected".This seems like a permission problem on the device mapper module. I'm digging around in the systemd documentation and service configuration but haven't solved it yet. I'll try adding some debug statements to get more information.
No LVM logical volumes found to snapshotto LVM snapshots fail under systemdOkay. If the documentation approach doesn't immediately yield results, one other thing you can do is try commenting out all of the systemd security-related options within the borgmatic service to see if that "fixes" the problem. And then try putting them back one at a time or a few at a time until you pinpoint the problem option.
I finally got this to work. In short, I had to run
systemctl edit borgmatic.service, and enter the following in the override file:You also have to either add
ProtectKernelModules=noabove, or else preload the dm_snapshot module, by addingdm_snapshotto /etc/modules-load.d/device-mapper.conf. All of this is on Ubuntu.There doesn't seem to be any way around adding CAP_SYS_ADMIN, since it’s the only capability that allows access to the device mapper ioctls, per capability(7). Because of that, I think that the same requirements may apply to the other filesystem snapshot methods too. Adding CAP_SYS_ADMIN is partly mitigated by other existing settings, such as
LockPersonality,NoNewPrivileges,ProtectKernelModules,ProtectKernelTunables,ProtectControlGroups,RestrictRealtime, andRestrictNamespaces, all of which are already set to true.With the above changes, LVM snapshots now work for me when borgmatic runs under systemd. This was already partly documented in the sample borgmatic.service file.
Awesome, thanks for digging into this and sharing your findings! I've updated both the documentation and the sample systemd service file with this info. Changes should be live shortly.