Run before_backup once (with multiple repositories) #790

Open
opened 2023-11-14 23:22:07 +00:00 by shadow7412 · 7 comments

What I'd like to do and why

I have a local and remote borg repo.

For a couple of applications I am backing up, there is a command that I need to run. This exports all of the databases/artefacts/etc to a specific directory. This is not a small operation (~10-25 minutes depending on the application). Getting around this is not possible (or at least not desirable, as it would complicate the restore process).

Using before_backup does work, but I only need this to run once rather than multiplying the time my backup takes to prepare by the amount of repos I have defined.

Current behaviour:

  • before_backups
  • backup to repo1
  • after_backups
  • before_backups
  • backup to repo2
  • after_backups

Desired behaviour:

  • before_backups
  • backup to repo1
  • backup to repo2
  • after_backups

Other notes / implementation ideas

Perhaps we could consider introducing a before_backup_once, to avoid breaking backwards compatibility for those who rely on this behaviour.

This would run once, just before running the backup command associated with the first repository in the list. If borg handles the before_backup config, perhaps the contents of before_backup_all could be prepended to before_backup for the first repo or something.

(Perhaps a matching after_backup_once is appropriate, not that I need it)

It should be noted that I would still expect multiple configurations to have each of their relevant before_backup_once handled individually.

### What I'd like to do and why I have a local and remote borg repo. For a couple of applications I am backing up, there is a command that I need to run. This exports all of the databases/artefacts/etc to a specific directory. This is not a small operation (~10-25 minutes depending on the application). Getting around this is not possible (or at least not desirable, as it would complicate the restore process). Using before_backup does work, but I only need this to run once rather than multiplying the time my backup takes to prepare by the amount of repos I have defined. #### Current behaviour: * before_backups * backup to repo1 * after_backups * before_backups * backup to repo2 * after_backups #### Desired behaviour: * before_backups * backup to repo1 * backup to repo2 * after_backups ### Other notes / implementation ideas Perhaps we could consider introducing a `before_backup_once`, to avoid breaking backwards compatibility for those who rely on this behaviour. This would run once, just before running the backup command associated with the first repository in the list. If borg handles the `before_backup` config, perhaps the contents of `before_backup_all` could be prepended to `before_backup` for the first repo or something. (Perhaps a matching `after_backup_once` is appropriate, not that I need it) It should be noted that I would still expect multiple configurations to have each of their relevant `before_backup_once` handled individually.
Author

Actually, re-reading the documentation... is this what before_everything is supposed to do? I only just noticed the caveat of it only running when create is among the operations.

Actually, re-reading the documentation... is this what `before_everything` is supposed to do? I only just noticed the caveat of it only running when `create` is among the operations.
Author

These are collected from all configuration files and then run once after all of them (after any action).

Does this mean that if any of the pre-commands fail, the entire backup doesn't run?

> These are collected from all configuration files and then run once after all of them (after any action). Does this mean that if any of the pre-commands fail, the entire backup doesn't run?
Owner

Yup, before_everything is the closest to what you're looking for. And it does run only when there's an implicit or explicit create action.

These are collected from all configuration files and then run once after all of them (after any action).

Does this mean that if any of the pre-commands fail, the entire backup doesn't run?

That's correct!

Yup, `before_everything` is the closest to what you're looking for. And it does run only when there's an implicit or explicit `create` action. > > These are collected from all configuration files and then run once after all of them (after any action). > > Does this mean that if any of the pre-commands fail, the entire backup doesn't run? That's correct!
witten added the
question / support
label 2023-11-14 23:40:35 +00:00
Author

Hmm. It probably would be nicer to have that single config file fail than all of them...

Hmm. It probably would be nicer to have that single config file fail than all of them...
Owner

before_backup actually used to work exactly as you've described above.. Prior to borgmatic 1.6.0 and #473, command hooks ran per-configuration rather than per-repository. The change to be per-repository was made though to better support running timing-sensitive tasks like pausing containers, thereby keeping each hook temporally closer to the action(s) it's wrapping.

So short of downgrading borgmatic (which I don't recommend), your best bet for now might be to:

  1. Use before_everything and just deal with any failures that affect all configuration files.
  2. Or use before_everything but invoke borgmatic separately for each configuration file that you want to succeed/fail independently. (See the --config flag.)
  3. Or use before_backup and deal with the repetition. (In fact, the built-in borgmatic database dumping feature also repeats dumping per-repository because it streams dumps directly to Borg without hitting disk.)

Hope this helps some.

`before_backup` actually used to work exactly as you've described above.. Prior to borgmatic 1.6.0 and #473, command hooks ran per-configuration rather than per-repository. The change to be per-repository was made though to better support running timing-sensitive tasks like pausing containers, thereby keeping each hook temporally _closer_ to the action(s) it's wrapping. So short of downgrading borgmatic (which I don't recommend), your best bet for now might be to: 1. Use `before_everything` and just deal with any failures that affect all configuration files. 2. Or use `before_everything` but invoke borgmatic separately for each configuration file that you want to succeed/fail independently. (See the `--config` flag.) 3. Or use `before_backup` and deal with the repetition. (In fact, the built-in borgmatic database dumping feature also repeats dumping per-repository because it streams dumps directly to Borg without hitting disk.) Hope this helps some.
Author

Taking a service down temporarily is a pretty reasonable usecase - so I understand the change.

But do you see value in implementing the old behaviour but as a different option?

For now, I think I'll have to take the before_everything route... but it would be nice to not have all backups fail because of a specific config's prerun.

Actually if we're talking optimisation, in my situation there's nothing really preventing the idea of running the differently config before steps in parallel. I daresay that'd be a complexity jump that doesn't attract huge interest though...

Taking a service down temporarily is a pretty reasonable usecase - so I understand the change. But do you see value in implementing the old behaviour but as a different option? For now, I think I'll have to take the before_everything route... but it would be nice to not have all backups fail because of a specific config's prerun. Actually if we're talking optimisation, in my situation there's nothing really preventing the idea of running the differently config `before` steps in parallel. I daresay that'd be a complexity jump that doesn't attract huge interest though...
Owner

I see the value in the old behavior, but I'm hesitant to build directly atop the existing command hook syntax to support it. That's because that approach would end up encoding so much detail into the option name itself, which gets kind of awkward IMO especially as more use cases are added. A more flexible (if less backwards-compatible) approach would be to come up with a new schema for command hooks that better supports these kinds of use cases. Here's a made-up example:

commands:
    # Before the prune action for each repository, run the command. Same as before_prune.
    - before: action
      when:
        - prune
      run:
        - echo test

    # Before all the actions for each repository, run the command. Same as before_actions.
    - before: repository
      run:
        - echo test

    # Before all the actions for all the repositories in the current configuration file, run the command...
    # but only when there is a create or check action. No analog today. Should satisfy your use case.
    - before: configuration
      when:
        - create
        - check
      run:
        - echo test

    # Before all configuration files, run the command... but only when there is a create or prune action.
    # Similar to before_everything but with the ability to chose the actions.
    - before: everything
      when:
        - create
        - prune
      run:
        - echo test

The big downside is verbosity and complexity. The upside is this is way more flexible for a variety of use cases including yours. I'm not 100% sold that the trade-off is worth it, but I thought it's worth continuing the discussion. (This is not the first time this kind of approach has been discussed.)

Actually if we're talking optimisation, in my situation there's nothing really preventing the idea of running the differently config before steps in parallel. I daresay that'd be a complexity jump that doesn't attract huge interest though...

Yeah, parallelism has come up in other contexts as well, but thus far nobody has worked on it.

I see the value in the old behavior, but I'm hesitant to build directly atop the existing command hook syntax to support it. That's because that approach would end up encoding so much detail into the option name itself, which gets kind of awkward IMO especially as more use cases are added. A more flexible (if less backwards-compatible) approach would be to come up with a new schema for command hooks that better supports these kinds of use cases. Here's a made-up example: ```yaml commands: # Before the prune action for each repository, run the command. Same as before_prune. - before: action when: - prune run: - echo test # Before all the actions for each repository, run the command. Same as before_actions. - before: repository run: - echo test # Before all the actions for all the repositories in the current configuration file, run the command... # but only when there is a create or check action. No analog today. Should satisfy your use case. - before: configuration when: - create - check run: - echo test # Before all configuration files, run the command... but only when there is a create or prune action. # Similar to before_everything but with the ability to chose the actions. - before: everything when: - create - prune run: - echo test ``` The big downside is verbosity and complexity. The upside is this is way more flexible for a variety of use cases including yours. I'm not 100% sold that the trade-off is worth it, but I thought it's worth continuing the discussion. (This is not the first time this kind of approach has been discussed.) > Actually if we're talking optimisation, in my situation there's nothing really preventing the idea of running the differently config before steps in parallel. I daresay that'd be a complexity jump that doesn't attract huge interest though... Yeah, parallelism has come up in other contexts as well, but thus far nobody has worked on it.
witten removed the
question / support
label 2024-01-09 21:54:16 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#790
No description provided.