Error using ZFS snapshotting #1001

Closed
opened 2025-02-18 02:45:01 +00:00 by kinetic5 · 12 comments

What I'm trying to do and why

I'm trying to use the ZFS snapshot feature of borgmatic to backup some files. I'm having an issue where borgmatic is attempting to snapshot and mount my root filesystem when I haven't specified it to do so. I am seeing a read-only filesystem error, from where it looks like it is trying to mount my actual dataset on top of the snapshotted root filesystem (which isn't possible since it is readonly).

image.png

truncated config file:

source_directories:
    - /dpool/shares/staticfiles

source_directories_must_exist: true

compression: zstd

upload_rate_limit: 5000

one_file_system: true

zfs:

Any idea how to fix this/what I'm doing wrong?

Steps to reproduce

No response

Actual behavior

No response

Expected behavior

No response

Other notes / implementation ideas

No response

borgmatic version

1.9.10

borgmatic installation method

pip install

Borg version

1.4.0

Python version

Python 3.11.2

Database version (if applicable)

No response

Operating system and version

Debian GNU/Linux 12 (Proxmox)

### What I'm trying to do and why I'm trying to use the ZFS snapshot feature of borgmatic to backup some files. I'm having an issue where borgmatic is attempting to snapshot and mount my root filesystem when I haven't specified it to do so. I am seeing a read-only filesystem error, from where it looks like it is trying to mount my actual dataset on top of the snapshotted root filesystem (which isn't possible since it is readonly). ![image.png](/attachments/5bf3a456-f175-42e4-bccd-5abe4c24b991) truncated config file: ``` source_directories: - /dpool/shares/staticfiles source_directories_must_exist: true compression: zstd upload_rate_limit: 5000 one_file_system: true zfs: ``` Any idea how to fix this/what I'm doing wrong? ### Steps to reproduce _No response_ ### Actual behavior _No response_ ### Expected behavior _No response_ ### Other notes / implementation ideas _No response_ ### borgmatic version 1.9.10 ### borgmatic installation method pip install ### Borg version 1.4.0 ### Python version Python 3.11.2 ### Database version (if applicable) _No response_ ### Operating system and version Debian GNU/Linux 12 (Proxmox)
Owner

Thanks for reporting this... I don't think you're necessarily doing anything wrong, and you've identified at least two different issues manifesting here:

  1. borgmatic is attempting to snapshot your root filesystem when that filesystem isn't configured for backup. I'm not sure what's going wrong there, but there is logic that attempts to associate each configured source directory with one of the detected ZFS datasets, starting from the longest dataset path and then proceeding to the shortest (so, the root). In theory, if a source directory matches a longer path, it should never be associated with a shorter one. In theory.
  2. borgmatic is attempting to mount your actual dataset's snapshot over the root snapshot mount point. This one is clearer. Looking at your log output, I think what's going on is that, well, I never tested with a root ZFS filesystem during development because I don't have a machine with a root ZFS filesystem. And so there just happens to be a really inconvenient overlap between the root snapshot mount point and your staticfiles snapshot mount point (because there's no final path component for a root snapshot mount point). I think the fix here may have to be a special case for any snapshotted root datasets to avoid that overlap.

Some things that might help debug the first point:

  • Could I see the output of the following? zfs list -H -t filesystem -o name,mountpoint,org.torsion.borgmatic:backup
  • Does the root still get snapshotted if you comment out source_directories?
Thanks for reporting this... I don't think you're necessarily doing anything wrong, and you've identified at least two different issues manifesting here: 1. borgmatic is attempting to snapshot your root filesystem when that filesystem isn't configured for backup. I'm not sure what's going wrong there, but there is logic that attempts to associate each configured source directory with one of the detected ZFS datasets, starting from the longest dataset path and then proceeding to the shortest (so, the root). In theory, if a source directory matches a longer path, it should never be associated with a shorter one. In theory. 2. borgmatic is attempting to mount your actual dataset's snapshot over the root snapshot mount point. This one is clearer. Looking at your log output, I think what's going on is that, well, I never tested with a root ZFS filesystem during development because I don't have a machine with a root ZFS filesystem. And so there just happens to be a really inconvenient overlap between the root snapshot mount point and your `staticfiles` snapshot mount point (because there's no final path component for a root snapshot mount point). I think the fix here may have to be a special case for any snapshotted root datasets to avoid that overlap. Some things that might help debug the first point: * Could I see the output of the following? `zfs list -H -t filesystem -o name,mountpoint,org.torsion.borgmatic:backup` * Does the root still get snapshotted if you comment out `source_directories`?
Author

Thanks for the quick response, here is the output from zfs list.

image.png

I currently have the org.torsion.borgmatic:backup property unset, but I have tried both methods of first using only source_directories to specify the dataset and then instead using the org.torsion.borgmatic:backup=auto property on the dataset with source_directories commented out. It seems to produce the same result in either case where it snapshots my root pool and tries to mount it with the data pool on top.

Thanks for the quick response, here is the output from zfs list. ![image.png](/attachments/ad7ca7b6-6998-4ad9-9060-1c96bd4380a2) I currently have the `org.torsion.borgmatic:backup` property unset, but I have tried both methods of first using only source_directories to specify the dataset and then instead using the `org.torsion.borgmatic:backup=auto` property on the dataset with `source_directories` commented out. It seems to produce the same result in either case where it snapshots my root pool and tries to mount it with the data pool on top.
Owner

Okay, I've got a system (well, a VM) with a ZFS root now, so I'll see if I can repro this. Looking at the code though, I'm a little mystified as to how the root is getting snapshotted on your system. If it comes to it, would you be comfortable installing a replacement borgmatic ZFS hook with added instrumentation/logging?

Okay, I've got a system (well, a VM) with a ZFS root now, so I'll see if I can repro this. Looking at the code though, I'm a little mystified as to how the root is getting snapshotted on your system. If it comes to it, would you be comfortable installing a replacement borgmatic ZFS hook with added instrumentation/logging?
Author

Yes, I can do that if needed.

Yes, I can do that if needed.
Owner

A quick update: Turns out, you won't need to install any additional instrumentation. I have a full repro of both problems here on my ZFS root VM! I added a little extra logging, and it looks like the root dataset is getting included in snapshotting because of a metadata directory that borgmatic sneaks into the archive to support the borgmatic config bootstrap action. Here's the added log:

local: Directory / contains patterns: /tmp/borgmatic-y08mzwgl/./borgmatic/bootstrap

So I'll probably need to come up with a way to exclude such non-user-specified directories from the snapshotting.

A quick update: Turns out, you won't need to install any additional instrumentation. I have a full repro of both problems here on my ZFS root VM! I added a little extra logging, and it looks like the root dataset is getting included in snapshotting because of a metadata directory that borgmatic sneaks into the archive to support the `borgmatic config bootstrap` action. Here's the added log: ``` local: Directory / contains patterns: /tmp/borgmatic-y08mzwgl/./borgmatic/bootstrap ``` So I'll probably need to come up with a way to exclude such non-user-specified directories from the snapshotting.
witten added the
bug
label 2025-02-22 04:18:07 +00:00
Owner

Okay, this has been fixed in main and will be part of the next release. Here's the changelog:

  • For the ZFS, Btrfs, and LVM hooks, only make snapshots for root patterns that come from a borgmatic configuration option (e.g. "source_directories")—not from other hooks within borgmatic.
  • Fix a ZFS/LVM error due to colliding snapshot mount points for nested datasets or logical volumes.
  • Don't try to snapshot ZFS datasets that have the "canmount=off" property.

Thanks again for the bug report!

Okay, this has been fixed in main and will be part of the next release. Here's the changelog: * For the ZFS, Btrfs, and LVM hooks, only make snapshots for root patterns that come from a borgmatic configuration option (e.g. "source_directories")—not from other hooks within borgmatic. * Fix a ZFS/LVM error due to colliding snapshot mount points for nested datasets or logical volumes. * Don't try to snapshot ZFS datasets that have the "canmount=off" property. Thanks again for the bug report!
Owner

Released in borgmatic 1.9.11!

Released in borgmatic 1.9.11!
Author

Thank you for working out this issue so quickly! After testing the new update, it looks like everything is working with the zfs snapshot and borg archive creation. However, I am seeing an error during the cleanup phase of the process:

image.png

It looks like zfs.py is trying to remove the temporary snapshot directory twice (lines 395 and 414). I think the issue stems from my root path '/' showing up when get_all_dataset_mount_points is run. This causes it to be picked up by snapshot_mount_path on line 383 and removed on line 395 in addition to the removal of snapshots_directory on line 414.

I added some additional debug messages prefixed by '***' to find this:

local: Calling bootstrap hook function remove_data_source_dumps
local: Looking for bootstrap manifest files to remove in /run/user/0/borgmatic/bootstrap
local: Removing bootstrap manifest at /run/user/0/borgmatic/bootstrap/manifest.json
local: Calling btrfs hook function remove_data_source_dumps
local: Calling lvm hook function remove_data_source_dumps
local: Calling mariadb hook function remove_data_source_dumps
local: Removing MariaDB data source dumps
local: Calling mongodb hook function remove_data_source_dumps
local: Removing MongoDB data source dumps
local: Calling mysql hook function remove_data_source_dumps
local: Removing MySQL data source dumps
local: Calling postgresql hook function remove_data_source_dumps
local: Removing PostgreSQL data source dumps
local: Calling sqlite hook function remove_data_source_dumps
local: Removing SQLite data source dumps
local: Calling zfs hook function remove_data_source_dumps
local: zfs list -H -t filesystem -o mountpoint
local: Looking for snapshots to remove in /run/user/0/borgmatic/zfs_snapshots/*
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /var/lib/vz -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/var/lib/vz
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /rpool/data -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/rpool/data
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /rpool/ROOT -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/rpool/ROOT
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /rpool -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/rpool
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /dpool/shares/staticfiles -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles
local: *** shutil.rmtree (zfs.py, line 395): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles
local: Unmounting ZFS snapshot at /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles
local: umount /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /dpool/shares -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares
local: *** shutil.rmtree (zfs.py, line 395): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /dpool -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool
local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): / -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/
local: *** shutil.rmtree (zfs.py, line 395): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/
local: *** shutil.rmtree (zfs.py, line 414): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d
Thank you for working out this issue so quickly! After testing the new update, it looks like everything is working with the zfs snapshot and borg archive creation. However, I am seeing an error during the cleanup phase of the process: ![image.png](/attachments/489ba5fe-9f67-483e-8a05-daff2c95364a) It looks like zfs.py is trying to remove the temporary snapshot directory twice (lines 395 and 414). I think the issue stems from my root path '/' showing up when `get_all_dataset_mount_points` is run. This causes it to be picked up by `snapshot_mount_path` on line 383 and removed on line 395 in addition to the removal of `snapshots_directory` on line 414. I added some additional debug messages prefixed by '***' to find this: ``` local: Calling bootstrap hook function remove_data_source_dumps local: Looking for bootstrap manifest files to remove in /run/user/0/borgmatic/bootstrap local: Removing bootstrap manifest at /run/user/0/borgmatic/bootstrap/manifest.json local: Calling btrfs hook function remove_data_source_dumps local: Calling lvm hook function remove_data_source_dumps local: Calling mariadb hook function remove_data_source_dumps local: Removing MariaDB data source dumps local: Calling mongodb hook function remove_data_source_dumps local: Removing MongoDB data source dumps local: Calling mysql hook function remove_data_source_dumps local: Removing MySQL data source dumps local: Calling postgresql hook function remove_data_source_dumps local: Removing PostgreSQL data source dumps local: Calling sqlite hook function remove_data_source_dumps local: Removing SQLite data source dumps local: Calling zfs hook function remove_data_source_dumps local: zfs list -H -t filesystem -o mountpoint local: Looking for snapshots to remove in /run/user/0/borgmatic/zfs_snapshots/* local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /var/lib/vz -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/var/lib/vz local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /rpool/data -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/rpool/data local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /rpool/ROOT -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/rpool/ROOT local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /rpool -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/rpool local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /dpool/shares/staticfiles -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles local: *** shutil.rmtree (zfs.py, line 395): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles local: Unmounting ZFS snapshot at /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles local: umount /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares/staticfiles local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /dpool/shares -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares local: *** shutil.rmtree (zfs.py, line 395): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool/shares local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): /dpool -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/dpool local: *** mount_point -> snapshot_mount_path (zfs.py, line 383): / -> /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/ local: *** shutil.rmtree (zfs.py, line 395): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d/ local: *** shutil.rmtree (zfs.py, line 414): /run/user/0/borgmatic/zfs_snapshots/fe64c169b096a4ed742d ```
kinetic5 reopened this issue 2025-02-23 19:26:11 +00:00
Owner

Thanks for reopening this and providing the additional details. Unfortunately I don't have a repro here even with my ZFS VM, but here's something to try since you're already in the code. If you add , ignore_errors=True to the second rmtree() call so it looks like the first one, does that "fix" the problem? If so, I can make the change to main.

Thanks for reopening this and providing the additional details. Unfortunately I don't have a repro here even with my ZFS VM, but here's something to try since you're already in the code. If you add `, ignore_errors=True` to the second `rmtree()` call so it looks like the first one, does that "fix" the problem? If so, I can make the change to main.
Author

Yes, adding , ignore_errors=True to the second rmtree fixes the issue for me.

Yes, adding `, ignore_errors=True` to the second `rmtree` fixes the issue for me.
Owner

Okay, thanks for verifying! I've made the change to main and it'll be part of the next release.

Okay, thanks for verifying! I've made the change to main and it'll be part of the next release.
Owner

Released in borgmatic 1.9.13!

Released in borgmatic 1.9.13!
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#1001
No description provided.