Why prune first? #304

Closed
opened 2020-04-29 05:54:28 +00:00 by kaysond · 10 comments

What's the point of pruning before creating the archive? Is it just to reduce storage space? To me it seems like it just wastes performance because you're potentially removing chunks that you would just transfer back during the creation phase.

I realize that I can run borgmatic create then borgmatic prune, but I'm curious if I'm missing the point...

What's the point of pruning before creating the archive? Is it just to reduce storage space? To me it seems like it just wastes performance because you're potentially removing chunks that you would just transfer back during the creation phase. I realize that I can run `borgmatic create` then `borgmatic prune`, but I'm curious if I'm missing the point...
Owner

Yup, the prune first is to free up space for the subsequent create. If instead you create first, and then prune after, you could in theory end up in a situation where you've got a full disk and the create fails, despite the fact that your retention policy dictates there's space that can be reclaimed.

To me it seems like it just wastes performance because you're potentially removing chunks that you would just transfer back during the creation phase.

This generally shouldn't happen. If your retention is set up to remove only the oldest stuff, then you won't be pruning any chunks that you're going to immediately sync back because you'll still have all the chunks for the most recent archives.

If I'm missing something here though, I'm open to other ideas!

Yup, the `prune` first is to free up space for the subsequent `create`. If instead you `create` first, and then `prune` after, you could in theory end up in a situation where you've got a full disk and the `create` fails, despite the fact that your retention policy dictates there's space that can be reclaimed. > To me it seems like it just wastes performance because you're potentially removing chunks that you would just transfer back during the creation phase. This generally shouldn't happen. If your retention is set up to remove only the oldest stuff, then you won't be pruning any chunks that you're going to immediately sync back because you'll still have all the chunks for the most recent archives. If I'm missing something here though, I'm open to other ideas!
witten added the
question / support
label 2020-04-29 06:01:06 +00:00
Author

I think prune->create makes sense in a storage-constrained repo. However, in my case, I have plenty of storage, but not so much bandwidth. I'd prefer to leave every chunk in the repo so that the minimum amount of data is transferred. create->prune makes more sense for my setup.

This generally shouldn't happen. If your retention is set up to remove only the oldest stuff, then you won't be pruning any chunks that you're going to immediately sync back because you'll still have all the chunks for the most recent archives.

I think for something like backing up a computer's drive full of plain files, sure. But there are definitely scenarios where you would be pruning chunks that you're immediately syncing back. One would be if you delete a large file, say a video, but restore it or re-download it later.

The other is if you have effectively random data. In my case, I'm using borg for offsite copies of compressed and/or encrypted backups. Rsync doesn't work well for this because my backup file names change every week, but borg is looking at content. So in this case, the more chunks in the repo in general, the less likely it is that data will have to be transferred.

I don't think there's a compelling reason to change the current behavior, but a config file or command line option to reverse the order would be nice. Related - borgmatic create prune doesn't obey the order of the subcommands. Fixing that would be another way of addressing this, though it might mess with some people's cron commands.

Unrelated - I like your avatar. That guy was creeepy

I think prune->create makes sense in a storage-constrained repo. However, in my case, I have plenty of storage, but not so much bandwidth. I'd prefer to leave every chunk in the repo so that the minimum amount of data is transferred. create->prune makes more sense for my setup. > This generally shouldn't happen. If your retention is set up to remove only the oldest stuff, then you won't be pruning any chunks that you're going to immediately sync back because you'll still have all the chunks for the most recent archives. I think for something like backing up a computer's drive full of plain files, sure. But there are definitely scenarios where you would be pruning chunks that you're immediately syncing back. One would be if you delete a large file, say a video, but restore it or re-download it later. The other is if you have effectively random data. In my case, I'm using borg for offsite copies of compressed and/or encrypted backups. Rsync doesn't work well for this because my backup file names change every week, but borg is looking at content. So in this case, the more chunks in the repo in general, the less likely it is that data will have to be transferred. I don't think there's a compelling reason to change the current behavior, but a config file or command line option to reverse the order would be nice. Related - `borgmatic create prune` doesn't obey the order of the subcommands. Fixing that would be another way of addressing this, though it might mess with some people's cron commands. Unrelated - I like your avatar. That guy was creeepy
Owner

Interesting use case! I never would've thought of Borg for syncing compressed/encrypted data, but it makes sense.

but a config file or command line option to reverse the order would be nice.

Got it. Yeah, either one of those could work.

Related - borgmatic create prune doesn't obey the order of the subcommands.

Yeah, borgmatic's command-line parsing completely ignores order. The reason is kind of obscure; it's actually hoovering up common flags (that can be specified anywhere) so it can apply them to multiple actions at once. But I could conceive of a change that at least respects the order of actions themselves.

Unrelated - I like your avatar. That guy was creeepy

Hah, indeed. He always looked to me like a super-creepy version of those default avatar heads.

Interesting use case! I never would've thought of Borg for syncing compressed/encrypted data, but it makes sense. > but a config file or command line option to reverse the order would be nice. Got it. Yeah, either one of those could work. > Related - `borgmatic create prune` doesn't obey the order of the subcommands. Yeah, borgmatic's command-line parsing completely ignores order. The reason is kind of obscure; it's actually hoovering up common flags (that can be specified anywhere) so it can apply them to multiple actions at once. But I could conceive of a change that at least respects the order of actions themselves. > Unrelated - I like your avatar. That guy was creeepy Hah, indeed. He always looked to me like a super-creepy version of those default avatar heads.
Author

So I'm gonna resurrect this because I just noticed another issue with pruning first: it actually causes you to retain one more backup than you want.

Suppose I have keep-daily 7, and I'm starting from scratch. The first 7 backups will not prune anything, obviously. On the 8th day, you will run prune, but there are only 7 backups in the repo, so it does nothing. Then you create another backup, and you have 8. On the 9th day, you will prune the 8th, create another one, and stay at 8.

Similarly, I just set up a new repo with keep-daily 1. I created my first backup, then made a second. The second run does a prune first, sees 1 backup, does nothing, then creates a second backup. Now I've got two!

In my case, I've got some full backups that pop up every once in a while, so this extra backup was causing me to run out of space faster than I expected because it was hanging on to two full backups!

So I'm gonna resurrect this because I just noticed another issue with pruning first: it actually causes you to retain one more backup than you want. Suppose I have `keep-daily 7`, and I'm starting from scratch. The first 7 backups will not prune anything, obviously. On the 8th day, you will run prune, but there are only 7 backups in the repo, so it does nothing. Then you create another backup, and you have 8. On the 9th day, you will prune the 8th, create another one, and stay at 8. Similarly, I just set up a new repo with `keep-daily 1`. I created my first backup, then made a second. The second run does a prune first, sees 1 backup, does nothing, then creates a second backup. Now I've got two! In my case, I've got some full backups that pop up every once in a while, so this extra backup was causing me to run out of space faster than I expected because it was hanging on to two full backups!
Owner

You make a good point here, although I think whether there are always "extra" backups retained depends on how freqently you run borgmatic. For instance, with the current prune-first behavior, setting keep_weekly: 1 and running borgmatic daily would indeed create two weekly backups after the first week—the first of which would get pruned the following day. So it is self-correcting in that case.

Having said that, I am totally open to changing the default behavior to prune after create or making it configurable. I've personally never run into the low-disk case where prune-first would be helpful.

You make a good point here, although I think whether there are always "extra" backups retained depends on how freqently you run borgmatic. For instance, with the current prune-first behavior, setting `keep_weekly: 1` and running borgmatic daily would indeed create two weekly backups after the first week—the first of which would get pruned the following day. So it is self-correcting in that case. Having said that, I am totally open to changing the default behavior to prune after create or making it configurable. I've personally never run into the low-disk case where prune-first would be helpful.
Author

setting keep_weekly: 1 and running borgmatic daily would indeed create two weekly backups after the first week—the first of which would get pruned the following day. So it is self-correcting in that case.

I just double checked, and I don't think you'd get two weekly backups after the first week, because if a backup is covered by the daily rule, it gets skipped by the weekly rule.

So following my example again, but now adding keep-weekly: 1: on the 9th day, you have 8 existing backups, which don't get pruned (7 daily, 1 weekly), then you create a 9th, and have 9 backups! On the 10th day, you prune first and go down to 8 backups, then create a 9th again.

But suppose you do prune-last, with keep-daily: 7, and keep-weekly: 1. On your first 8 backups, nothing gets pruned. On the 9th day, you create a 9th, but prune afterwards, removing the oldest. Now you're down to 8 again.

I've personally never run into the low-disk case where prune-first would be helpful.

If I'm not mistaken, prune-first doesn't actually help with disk space because you're creating a 9th in both cases, but with prune-last, you're leaving yourself more space after you're done, which would, for example, give you more room for other repos/backups.

> setting keep_weekly: 1 and running borgmatic daily would indeed create two weekly backups after the first week—the first of which would get pruned the following day. So it is self-correcting in that case. I just double checked, and I don't think you'd get two weekly backups after the first week, because if a backup is covered by the daily rule, it gets skipped by the weekly rule. So following my example again, but now adding `keep-weekly: 1`: on the 9th day, you have 8 existing backups, which don't get pruned (7 daily, 1 weekly), then you create a 9th, and have 9 backups! On the 10th day, you prune first and go down to 8 backups, then create a 9th again. But suppose you do prune-last, with `keep-daily: 7`, and `keep-weekly: 1`. On your first 8 backups, nothing gets pruned. On the 9th day, you create a 9th, but prune afterwards, removing the oldest. Now you're down to 8 again. > I've personally never run into the low-disk case where prune-first would be helpful. If I'm not mistaken, prune-first doesn't actually help with disk space because you're creating a 9th in both cases, but with prune-last, you're leaving yourself more space after you're done, which would, for example, give you more room for other repos/backups.

Add me to the list of users who think this is the wrong order. :)

Related - borgmatic create prune doesn't obey the order of the subcommands.

Yeah, borgmatic's command-line parsing completely ignores order. The reason is kind of obscure; it's actually hoovering up common flags (that can be specified anywhere) so it can apply them to multiple actions at once. But I could conceive of a change that at least respects the order of actions themselves.

Ah but that is interesting. It's really counter-intuitive and I understand that flag parsing is kind of a pain in the back, but it would totally make sense to respect that order.

In fact, in #636 I almost suggest to allow such a thing so that order could be picked by the user, but I realized I didn't know if it was already implemented or not...

It seems like a choice place to make this customizable, if that's a thing you want to do to solve this... Otherwise I think pruning second would be the right thing to do here, although I haven't delved into the (sometimes hard to figure out) retention policies math...

Add me to the list of users who think this is the wrong order. :) > > Related - borgmatic create prune doesn't obey the order of the subcommands. > > Yeah, borgmatic's command-line parsing completely ignores order. The reason is kind of obscure; it's actually hoovering up common flags (that can be specified anywhere) so it can apply them to multiple actions at once. But I could conceive of a change that at least respects the order of actions themselves. Ah but *that* is interesting. It's really counter-intuitive and I understand that flag parsing is kind of a pain in the back, but it would totally make sense to respect that order. In fact, in #636 I almost suggest to allow such a thing so that order could be picked by the user, but I realized I didn't know if it was already implemented or not... It seems like a choice place to make this customizable, if that's a thing you want to do to solve this... Otherwise I think pruning second would be the right thing to do here, although I haven't delved into the (sometimes hard to figure out) retention policies math...
Owner

I just pushed code to master such that borgmatic now respects command-line action ordering. This will be part of the next release. There's still more to do for this ticket, however.

I just pushed code to master such that borgmatic now respects command-line action ordering. This will be part of the next release. There's still more to do for this ticket, however.
Owner

Okay, I've also gone ahead and changed the default action order to: create, prune, compact, check.

I was going to add a configuration option to override the default order, but that information is needed in the borgmatic code at the time arguments are being parsed—before the configuration files are even loaded. So I omitted that enhancement for now.

I'm calling this feature done, but please let me know if you have any further feedback. This will be part of the next release (1.7.9).

Okay, I've also gone ahead and changed the default action order to: `create`, `prune`, `compact`, `check`. I was going to add a configuration option to override the default order, but that information is needed in the borgmatic code at the time arguments are being parsed—before the configuration files are even loaded. So I omitted that enhancement for now. I'm calling this feature done, but please let me know if you have any further feedback. This will be part of the next release (1.7.9).
Owner

Just released in borgmatic 1.7.9!

Just released in borgmatic 1.7.9!
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#304
No description provided.