borgmatic prune unexpectedly selects all hostnames on default settings #753

Closed
opened 2023-09-06 10:01:27 +00:00 by gemarcano · 4 comments

What I'm trying to do and why

I noticed a lot of my backups were pruned automatically recently, including from devices I haven't updated recently. The recent deprecation of prefix in the configuration file indicates that match_archives and archive_name_format should be used instead. The default example config states:

# Name of the archive. Borg placeholders can be used. See the output
# of "borg help placeholders" for details. Defaults to
# "{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f}". When running actions like
# rlist, info, or check, borgmatic automatically tries to match only
# archives created with this name format.

Moreover, it also states:

# If match_archives is not specified, borgmatic defaults to deriving
# the match_archives value from archive_name_format.

However, in the implementation of make_prune_flags the following happens:

    return tuple(element for pair in flag_pairs for element in pair) + (
        (
            ('--match-archives', f'sh:{prefix}*')
            if feature.available(feature.Feature.MATCH_ARCHIVES, local_borg_version)
            else ('--glob-archives', f'{prefix}*')
        )
        if prefix
        else (
            flags.make_match_archives_flags(
                config.get('match_archives'),
                config.get('archive_name_format'),
                local_borg_version,
            )
        )
    )

From debugging this with pudb, at that else statement config.get('archive_name_format') returns an empty string, which goes against my understanding of what the configuration file claims. This then leads the make_match_archives_flag method to do:

    if not archive_name_format:
        return ()

before it tries to derive match_archives. As a result, all archives, regardless of their hostname, are considered for pruning.

This may also affect other callers of make_match_archives_flags, as I don't see any other functions implement a default for 'archive_name_format', but I did not investigate in detail.

Steps to reproduce

  1. Have a backup repository with different hostname archives, with both match_archives and archive_name_format left unspecified on the different hosts.
  2. borgmatic prune --list

Actual behavior

# borgmatic prune --list
Keeping archive (rule: daily #1):        mothra-2023-09-05T21:48:03.832526    Tue, 2023-09-05 21:48:03 [6d4432619e4e1d208129ab27807b6f5548eacd9af942751a62efe7ae815710d6]
Keeping archive (rule: daily #2):        fenix-2023-09-02T01:44:25.351229     Fri, 2023-09-01 18:44:27 [28db7c07f52c38ee305ca8564a13971942ea665fa8db5f63b6a6128bfa982f02]
Keeping archive (rule: daily #3):        fenix-2023-08-30T00:33:55.078882     Tue, 2023-08-29 17:33:56 [a9309fc2812591e6329ec23ae8b978a0822a0192f7794c67f7513390b6a33fc9]
Keeping archive (rule: daily #4):        fenix-2023-08-29T02:55:29.419836     Mon, 2023-08-28 19:55:31 [6dbd382b4f455d709b5e59fa5f893308ee4bc0cec4beee5c0f855a0027a61857]
Keeping archive (rule: daily #5):        fenix-2023-08-28T01:02:34.926881     Sun, 2023-08-27 18:02:36 [c23b4d39a92c253618612f73bfb356d9d5d10ddf794cd2a853f6c3887a45bce9]
Keeping archive (rule: daily #6):        fenix-2023-08-26T02:43:43.488500     Fri, 2023-08-25 19:43:45 [fccdecc5a2ab93a0820c44058703814c49109b8cf2b523523fce09879063683b]
Keeping archive (rule: daily #7):        fenix-2023-08-25T01:38:58.971417     Thu, 2023-08-24 18:39:00 [0779fdca4e354d166168d707a0849e88de688102aad118be6a1e7ad6bd868568]
Keeping archive (rule: weekly #1):       fenix-2023-08-19T02:57:40.756150     Fri, 2023-08-18 19:57:42 [b2555178a7b3befd00d4feb74b7408471774c0df78e6031727c7180012bc894c]
Keeping archive (rule: weekly #2):       fenix-2023-08-09T02:23:39.579639     Tue, 2023-08-08 19:23:41 [73e98371665a13a95f3951c76de1fc4e0c72df673bfaa1fa47ce3c476935b652]
Keeping archive (rule: weekly #3):       fenix-2023-08-03T01:25:36.480172     Wed, 2023-08-02 18:25:38 [77f3ea1c6012da803c3050f9bb7bdbd6bc5472bd522792a2f981369cef1c275b]
Keeping archive (rule: weekly #4):       fenix-2023-07-27T01:46:29.356684     Wed, 2023-07-26 18:46:30 [3db7305c97d0d343a54ab2460f3f71ecc5c04589a3a577ad73a6ed3a141fbea5]
Keeping archive (rule: monthly #1):      fenix-2023-06-01T01:45:37.261723     Wed, 2023-05-31 18:45:38 [5aa72c312bad779e18e24735a1390dde0191c9717dd2d7ebad026e82dfcf9801]
Keeping archive (rule: monthly #2):      mothra-2023-04-28T12:00:43.295330    Fri, 2023-04-28 12:00:43 [0d289c07d77f182b50871f99ac55b87c5146309e8a3a51ca88c553c33af6e0a9]
Keeping archive (rule: monthly #3):      mothra-2023-03-31T10:42:08.744537    Fri, 2023-03-31 10:42:08 [e56b27e070a0ea19e5d76cd288af90ec42b6db06ce77b624ffcdb8d26cf70a67]
Keeping archive (rule: monthly #4):      mothra-2023-02-24T05:49:05.001552    Fri, 2023-02-24 05:49:05 [8c63ce470ad41d9269acaf3dd7c85642383637b568ff8e2fe0cd916b2650688a]
Keeping archive (rule: monthly #5):      mothra-2022-11-28T23:49:47.998068    Mon, 2022-11-28 23:49:48 [0e85cb35df51a1b05de361960c51c0b7e6d05a6b08551b0ee5f441a7abecc362]
Keeping archive (rule: monthly #6):      mothra-2021-12-31T03:01:29.082153    Fri, 2021-12-31 03:01:29 [2e405d5e863e44fd40e337b4d092cbe94488f9bd063e444d43edf7b17fa808a5]

Expected behavior

# borgmatic prune --list
Keeping archive (rule: daily #1):        mothra-2023-09-05T21:48:03.832526    Tue, 2023-09-05 21:48:03 [6d4432619e4e1d208129ab27807b6f5548eacd9af942751a62efe7ae815710d6]
Keeping archive (rule: daily #2):        mothra-2023-04-28T12:00:43.295330    Fri, 2023-04-28 12:00:43 [0d289c07d77f182b50871f99ac55b87c5146309e8a3a51ca88c553c33af6e0a9]
Keeping archive (rule: daily #3):        mothra-2023-03-31T10:42:08.744537    Fri, 2023-03-31 10:42:08 [e56b27e070a0ea19e5d76cd288af90ec42b6db06ce77b624ffcdb8d26cf70a67]
Keeping archive (rule: daily #4):        mothra-2023-02-24T05:49:05.001552    Fri, 2023-02-24 05:49:05 [8c63ce470ad41d9269acaf3dd7c85642383637b568ff8e2fe0cd916b2650688a]
Keeping archive (rule: daily #5):        mothra-2022-11-28T23:49:47.998068    Mon, 2022-11-28 23:49:48 [0e85cb35df51a1b05de361960c51c0b7e6d05a6b08551b0ee5f441a7abecc362]
Keeping archive (rule: daily #6):        mothra-2021-12-31T03:01:29.082153    Fri, 2021-12-31 03:01:29 [2e405d5e863e44fd40e337b4d092cbe94488f9bd063e444d43edf7b17fa808a5]

Other notes / implementation ideas

A workaround is to specify match_archives to be sh:{hostname}-* explicitly. Looking at the source code, it might also work if archive_name_format is explicitly defined, but I did not try this.

This does not happen in 1.7.7, but I think that's because prefix hadn't been deprecated yet.

The fix might be to make sure that if archive_name_format is empty, that a default is assigned.

borgmatic version

1.8.2

borgmatic installation method

emerge borgmatic

Borg version

borg 1.2.6

Python version

Python 3.11.5

Database version (if applicable)

No response

Operating system and version

NAME=Gentoo ID=gentoo PRETTY_NAME="Gentoo Linux" ANSI_COLOR="1;32" HOME_URL="https://www.gentoo.org/" SUPPORT_URL="https://www.gentoo.org/support/" BUG_REPORT_URL="https://bugs.gentoo.org/" VERSION_ID="2.14"

### What I'm trying to do and why I noticed a lot of my backups were pruned automatically recently, including from devices I haven't updated recently. The recent deprecation of `prefix` in the configuration file indicates that `match_archives` and `archive_name_format` should be used instead. The default example config states: ``` # Name of the archive. Borg placeholders can be used. See the output # of "borg help placeholders" for details. Defaults to # "{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f}". When running actions like # rlist, info, or check, borgmatic automatically tries to match only # archives created with this name format. ``` Moreover, it also states: ``` # If match_archives is not specified, borgmatic defaults to deriving # the match_archives value from archive_name_format. ``` However, in the implementation of `make_prune_flags` the following happens: ``` return tuple(element for pair in flag_pairs for element in pair) + ( ( ('--match-archives', f'sh:{prefix}*') if feature.available(feature.Feature.MATCH_ARCHIVES, local_borg_version) else ('--glob-archives', f'{prefix}*') ) if prefix else ( flags.make_match_archives_flags( config.get('match_archives'), config.get('archive_name_format'), local_borg_version, ) ) ) ``` From debugging this with `pudb`, at that `else` statement `config.get('archive_name_format')` returns an empty string, which goes against my understanding of what the configuration file claims. This then leads the `make_match_archives_flag` method to do: ``` if not archive_name_format: return () ``` before it tries to derive match_archives. As a result, all archives, regardless of their hostname, are considered for pruning. This may also affect other callers of `make_match_archives_flags`, as I don't see any other functions implement a default for 'archive_name_format', but I did not investigate in detail. ### Steps to reproduce 1. Have a backup repository with different hostname archives, with both `match_archives` and `archive_name_format` left unspecified on the different hosts. 2. `borgmatic prune --list` ### Actual behavior ``` # borgmatic prune --list Keeping archive (rule: daily #1): mothra-2023-09-05T21:48:03.832526 Tue, 2023-09-05 21:48:03 [6d4432619e4e1d208129ab27807b6f5548eacd9af942751a62efe7ae815710d6] Keeping archive (rule: daily #2): fenix-2023-09-02T01:44:25.351229 Fri, 2023-09-01 18:44:27 [28db7c07f52c38ee305ca8564a13971942ea665fa8db5f63b6a6128bfa982f02] Keeping archive (rule: daily #3): fenix-2023-08-30T00:33:55.078882 Tue, 2023-08-29 17:33:56 [a9309fc2812591e6329ec23ae8b978a0822a0192f7794c67f7513390b6a33fc9] Keeping archive (rule: daily #4): fenix-2023-08-29T02:55:29.419836 Mon, 2023-08-28 19:55:31 [6dbd382b4f455d709b5e59fa5f893308ee4bc0cec4beee5c0f855a0027a61857] Keeping archive (rule: daily #5): fenix-2023-08-28T01:02:34.926881 Sun, 2023-08-27 18:02:36 [c23b4d39a92c253618612f73bfb356d9d5d10ddf794cd2a853f6c3887a45bce9] Keeping archive (rule: daily #6): fenix-2023-08-26T02:43:43.488500 Fri, 2023-08-25 19:43:45 [fccdecc5a2ab93a0820c44058703814c49109b8cf2b523523fce09879063683b] Keeping archive (rule: daily #7): fenix-2023-08-25T01:38:58.971417 Thu, 2023-08-24 18:39:00 [0779fdca4e354d166168d707a0849e88de688102aad118be6a1e7ad6bd868568] Keeping archive (rule: weekly #1): fenix-2023-08-19T02:57:40.756150 Fri, 2023-08-18 19:57:42 [b2555178a7b3befd00d4feb74b7408471774c0df78e6031727c7180012bc894c] Keeping archive (rule: weekly #2): fenix-2023-08-09T02:23:39.579639 Tue, 2023-08-08 19:23:41 [73e98371665a13a95f3951c76de1fc4e0c72df673bfaa1fa47ce3c476935b652] Keeping archive (rule: weekly #3): fenix-2023-08-03T01:25:36.480172 Wed, 2023-08-02 18:25:38 [77f3ea1c6012da803c3050f9bb7bdbd6bc5472bd522792a2f981369cef1c275b] Keeping archive (rule: weekly #4): fenix-2023-07-27T01:46:29.356684 Wed, 2023-07-26 18:46:30 [3db7305c97d0d343a54ab2460f3f71ecc5c04589a3a577ad73a6ed3a141fbea5] Keeping archive (rule: monthly #1): fenix-2023-06-01T01:45:37.261723 Wed, 2023-05-31 18:45:38 [5aa72c312bad779e18e24735a1390dde0191c9717dd2d7ebad026e82dfcf9801] Keeping archive (rule: monthly #2): mothra-2023-04-28T12:00:43.295330 Fri, 2023-04-28 12:00:43 [0d289c07d77f182b50871f99ac55b87c5146309e8a3a51ca88c553c33af6e0a9] Keeping archive (rule: monthly #3): mothra-2023-03-31T10:42:08.744537 Fri, 2023-03-31 10:42:08 [e56b27e070a0ea19e5d76cd288af90ec42b6db06ce77b624ffcdb8d26cf70a67] Keeping archive (rule: monthly #4): mothra-2023-02-24T05:49:05.001552 Fri, 2023-02-24 05:49:05 [8c63ce470ad41d9269acaf3dd7c85642383637b568ff8e2fe0cd916b2650688a] Keeping archive (rule: monthly #5): mothra-2022-11-28T23:49:47.998068 Mon, 2022-11-28 23:49:48 [0e85cb35df51a1b05de361960c51c0b7e6d05a6b08551b0ee5f441a7abecc362] Keeping archive (rule: monthly #6): mothra-2021-12-31T03:01:29.082153 Fri, 2021-12-31 03:01:29 [2e405d5e863e44fd40e337b4d092cbe94488f9bd063e444d43edf7b17fa808a5] ``` ### Expected behavior ``` # borgmatic prune --list Keeping archive (rule: daily #1): mothra-2023-09-05T21:48:03.832526 Tue, 2023-09-05 21:48:03 [6d4432619e4e1d208129ab27807b6f5548eacd9af942751a62efe7ae815710d6] Keeping archive (rule: daily #2): mothra-2023-04-28T12:00:43.295330 Fri, 2023-04-28 12:00:43 [0d289c07d77f182b50871f99ac55b87c5146309e8a3a51ca88c553c33af6e0a9] Keeping archive (rule: daily #3): mothra-2023-03-31T10:42:08.744537 Fri, 2023-03-31 10:42:08 [e56b27e070a0ea19e5d76cd288af90ec42b6db06ce77b624ffcdb8d26cf70a67] Keeping archive (rule: daily #4): mothra-2023-02-24T05:49:05.001552 Fri, 2023-02-24 05:49:05 [8c63ce470ad41d9269acaf3dd7c85642383637b568ff8e2fe0cd916b2650688a] Keeping archive (rule: daily #5): mothra-2022-11-28T23:49:47.998068 Mon, 2022-11-28 23:49:48 [0e85cb35df51a1b05de361960c51c0b7e6d05a6b08551b0ee5f441a7abecc362] Keeping archive (rule: daily #6): mothra-2021-12-31T03:01:29.082153 Fri, 2021-12-31 03:01:29 [2e405d5e863e44fd40e337b4d092cbe94488f9bd063e444d43edf7b17fa808a5] ``` ### Other notes / implementation ideas A workaround is to specify `match_archives` to be `sh:{hostname}-*` explicitly. Looking at the source code, it might also work if `archive_name_format` is explicitly defined, but I did not try this. This does not happen in 1.7.7, but I think that's because `prefix` hadn't been deprecated yet. The fix might be to make sure that if `archive_name_format` is empty, that a default is assigned. ### borgmatic version 1.8.2 ### borgmatic installation method emerge borgmatic ### Borg version borg 1.2.6 ### Python version Python 3.11.5 ### Database version (if applicable) _No response_ ### Operating system and version NAME=Gentoo ID=gentoo PRETTY_NAME="Gentoo Linux" ANSI_COLOR="1;32" HOME_URL="https://www.gentoo.org/" SUPPORT_URL="https://www.gentoo.org/support/" BUG_REPORT_URL="https://bugs.gentoo.org/" VERSION_ID="2.14"
Author

And yes, apparently the grand purge of all of my older backups took place sometime around March of this year, I just didn't notice it until now (at least according to data from my off-site backup location). From my understanding of borg, those old prunes are gone now, right? I haven't suffered data loss so it's not a huge issue, but boy am I glad I caught this issue now and not after actually needing all of the archives intact.

I did a bit more testing, and I can confirm that setting archive_name_format explicitly to what's supposed to be the default value also leads to correct behavior.

And yes, apparently the grand purge of all of my older backups took place sometime around March of this year, I just didn't notice it until now (at least according to data from my off-site backup location). From my understanding of borg, those old prunes are gone now, right? I haven't suffered data loss so it's not a huge issue, but boy am I glad I caught this issue now and not after actually needing all of the archives intact. I did a bit more testing, and I can confirm that setting `archive_name_format` explicitly to what's supposed to be the default value also leads to correct behavior.
Owner

Thanks for taking the time to file this! Right now the documented archive_name_format default is only used to name the archive during archive creation (borgmatic create), and the documentation / schema comments are misleading in suggesting that the default applies to other actions as well. I agree with you that it would make more sense to also apply the default for prune and other actions, specifically when deriving the match archives flags.

From my understanding of borg, those old prunes are gone now, right?

That's unfortunately correct!

Thanks for taking the time to file this! Right now the documented `archive_name_format` default is only used to name the archive during archive creation (`borgmatic create`), and the documentation / schema comments are misleading in suggesting that the default applies to other actions as well. I agree with you that it would make more sense to also apply the default for `prune` and other actions, specifically when deriving the match archives flags. > From my understanding of borg, those old prunes are gone now, right? That's unfortunately correct!
Owner

This is implemented in main and will be part of the next release. Thanks again for highlighting the need.

You should be aware though of this ticket: #748. If implemented as currently described, it would probably break your use case because it would start treating {hostname} in the archive name format as * for purposes of filtering archives. I'm open to ideas on that ticket for how to solve the ask without breaking your use case!

This is implemented in main and will be part of the next release. Thanks again for highlighting the need. You should be aware though of this ticket: #748. If implemented as currently described, it would probably break your use case because it would start treating `{hostname}` in the archive name format as `*` for purposes of filtering archives. I'm open to ideas on that ticket for how to solve the ask without breaking your use case!
Owner

This has been released as part of borgmatic 1.8.3!

This has been released as part of borgmatic 1.8.3!
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#753
No description provided.