WIP: Add container volume backups with podman cli #895

Closed
IBims1NicerTobi wants to merge 4 commits from IBims1NicerTobi/borgmatic:main into main
Contributor
No description provided.
IBims1NicerTobi added 2 commits 2024-07-04 21:24:42 +00:00
Author
Contributor

So as of right now I have no Idea how exactly the restore function would have to look like and I would really appreciate your input. The restore command should look like this: borg extract [ARCHIVE] [PATH TO DUMP]. I know that borgmatic will probably provide me the archive and the path in the function should be built from the volume name but I'm not sure how to put it all together. I also think it would make sense if the restore had two modes for now: By default the restore should create the volume and fail if the volume exists already, if the user provides something line --force we delete the volume and overwrite it with the contents of the archive.

So as of right now I have no Idea how exactly the restore function would have to look like and I would really appreciate your input. The restore command should look like this: `borg extract [ARCHIVE] [PATH TO DUMP]`. I know that borgmatic will probably provide me the archive and the path in the function should be built from the volume name but I'm not sure how to put it all together. I also think it would make sense if the restore had two modes for now: By default the restore should create the volume and fail if the volume exists already, if the user provides something line --force we delete the volume and overwrite it with the contents of the archive.
Owner

So as of right now I have no Idea how exactly the restore function would have to look like and I would really appreciate your input. The restore command should look like this: borg extract [ARCHIVE] [PATH TO DUMP]. I know that borgmatic will probably provide me the archive and the path in the function should be built from the volume name but I'm not sure how to put it all together.

borgmatic actually does much more than that for the existing database hooks, which I suggest modeling your hook after. Take a look at, for instance, borgmatic/hooks/mariadb.py:restore_data_source_dump(). That function is handed an extract_process value that represents the borg extract command that borgmatic has already started running to extract the correct path from the selected archive. Then you'll notice that at the very end of the function, extract_process.stdout is passed as the input_file to the MariaDB restore command. In other words, the borg extract stdout gets streamed directly to the stdin of the MariaDB restore.

So you could do the same thing in a Podman hook—the stdout of the extract_process that borgmatic hands you could get passed as input to your podman volume import command.

I also think it would make sense if the restore had two modes for now: By default the restore should create the volume and fail if the volume exists already, if the user provides something line --force we delete the volume and overwrite it with the contents of the archive.

That makes sense to me. (I might suggest calling it --overwrite instead of --force though just because that's a little more explicit.)

> So as of right now I have no Idea how exactly the restore function would have to look like and I would really appreciate your input. The restore command should look like this: borg extract [ARCHIVE] [PATH TO DUMP]. I know that borgmatic will probably provide me the archive and the path in the function should be built from the volume name but I'm not sure how to put it all together. borgmatic actually does much more than that for the existing database hooks, which I suggest modeling your hook after. Take a look at, for instance, `borgmatic/hooks/mariadb.py:restore_data_source_dump()`. That function is handed an `extract_process` value that represents the `borg extract` command that borgmatic has already started running to extract the correct path from the selected archive. Then you'll notice that at the very end of the function, `extract_process.stdout` is passed as the `input_file` to the MariaDB restore command. In other words, the `borg extract` stdout gets streamed directly to the stdin of the MariaDB restore. So you could do the same thing in a Podman hook—the stdout of the `extract_process` that borgmatic hands you could get passed as input to your `podman volume import` command. > I also think it would make sense if the restore had two modes for now: By default the restore should create the volume and fail if the volume exists already, if the user provides something line --force we delete the volume and overwrite it with the contents of the archive. That makes sense to me. (I might suggest calling it `--overwrite` instead of `--force` though just because that's a little more explicit.)
witten reviewed 2024-07-04 22:58:20 +00:00
witten left a comment
Owner

Just a few comments on the schema since this is still WIP..

Just a few comments on the schema since this is still WIP..
@ -1873,3 +1873,21 @@ properties:
can send the logs to a self-hosted instance or create an account at
https://grafana.com/auth/sign-up/create-user. See borgmatic
monitoring documentation for details.
podman_databases:
Owner

This could just be podman: since there's nothing database-specific about this.

This could just be `podman:` since there's nothing database-specific about this.
Author
Contributor

Fixed

Fixed
IBims1NicerTobi marked this conversation as resolved
@ -1876,0 +1878,4 @@
required: ['volumes']
additionalProperties: false
properties:
volumes:
Owner

Is the rationale with having the volumes option nested under here that you might additional Podman-specific options in the future? containers, etc?

Is the rationale with having the `volumes` option nested under here that you might additional Podman-specific options in the future? `containers`, etc?
Author
Contributor

Yeah other than volumes I would also like to support labels.

Yeah other than volumes I would also like to support labels.
IBims1NicerTobi marked this conversation as resolved
IBims1NicerTobi added 1 commit 2024-07-05 12:23:01 +00:00
Author
Contributor

Ok we maybe need some docs saying that the hook dump and restore layout of the schema is related to the restore code. The code in actions/restore.py checks for the database name on a specified point in the schema and assumes it is there (which is the case for all the database hooks but does not really make sense in my case since there might be a lot of volumes and I would prefer to not model my schema the way the database hooks have so far).

Ok we maybe need some docs saying that the hook dump and restore layout of the schema is related to the restore code. The code in actions/restore.py checks for the database name on a specified point in the schema and assumes it is there (which is the case for all the database hooks but does not really make sense in my case since there might be a lot of volumes and I would prefer to not model my schema the way the database hooks have so far).
Author
Contributor

Can we maybe add a function to every database hook to the name of the configured databases instead of assuming their position in the schema?

Can we maybe add a function to every database hook to the name of the configured databases instead of assuming their position in the schema?
IBims1NicerTobi added 1 commit 2024-07-05 13:07:33 +00:00
Author
Contributor

@witten Any opinion on this?

@witten Any opinion on this?
Author
Contributor

Also it looks like I cannot just call the hook podman because the _databases part of the path is hardcoded in borgmatic/actions/restore.py in line 166.

Also it looks like I cannot just call the hook `podman` because the `_databases` part of the path is hardcoded in `borgmatic/actions/restore.py` in line `166`.
Owner

Can we maybe add a function to every database hook to the name of the configured databases instead of assuming their position in the schema?

Yes, adding new functions to the internal data source "API" is of course possible. However it might not be sufficient to just add a function to resolve data source names, because it looks like there's also code in actions/restore.py that assumes the hook configuration itself is just a list. So one way to address both problems might be to change the code such that only the volumes: portion of your configuration gets passed to these various restore.py functions. That way: 1. It'd still be dealing in a plain list (of mappings) like the other data sources, and 2. Each element of the list would have a name like the other data sources.

EDIT: Although I see that in your existing schema, the volumes names are just a list of strings, so that still doesn't directly line up with the name-style schema. But you could change to a format like the following, which would line up, at least the volume: level:

podman:
    volumes:
        - name: db
        - name: data
        - name: config

That'd also have the benefit of allowing future options on a per-volume basis. Although maybe that's just unnecessary future-proofing/rationalization.

I'm open to other approaches of course; that's just one approach that occurred to me. I haven't worked through all the details of how to make it happen.

Also it looks like I cannot just call the hook podman because the _databases part of the path is hardcoded in borgmatic/actions/restore.py in line 166.

Excellent point. But that code might still work for the Podman hook if another glob path was added, e.g.:

            for pattern in ('*_databases/*/*', '*_volumes/*/*')
> Can we maybe add a function to every database hook to the name of the configured databases instead of assuming their position in the schema? Yes, adding new functions to the internal data source "API" is of course possible. However it might not be sufficient to just add a function to resolve data source names, because it looks like there's also code in `actions/restore.py` that assumes the hook configuration itself is just a list. So one way to address both problems might be to change the code such that only the `volumes:` portion of your configuration gets passed to these various `restore.py` functions. That way: 1. It'd still be dealing in a plain list (of mappings) like the other data sources, and 2. Each element of the list would have a `name` like the other data sources. EDIT: Although I see that in your existing schema, the volumes names are just a list of strings, so that still doesn't directly line up with the `name`-style schema. But you could change to a format like the following, which would line up, at least the `volume:` level: ```yaml podman: volumes: - name: db - name: data - name: config ``` That'd also have the benefit of allowing future options on a per-volume basis. Although maybe that's just unnecessary future-proofing/rationalization. I'm open to other approaches of course; that's just one approach that occurred to me. I haven't worked through all the details of how to make it happen. > Also it looks like I cannot just call the hook podman because the _databases part of the path is hardcoded in borgmatic/actions/restore.py in line 166. Excellent point. But that code might still work for the Podman hook if another glob path was added, e.g.: ```python for pattern in ('*_databases/*/*', '*_volumes/*/*') ```
Author
Contributor

Yeah but tbh that just feels hacky. I really dislike the use of magical names here and I think it would be much clearer if every hook just provided the name it used for the database dump directory directly instead of changing it for every new hook that comes along. I would prefer it if the hook api was just better documented and integrated less into the rest of the tool. Imho writing a hook should not require reading all of borgmatics source code.

Yeah but tbh that just feels hacky. I really dislike the use of magical names here and I think it would be much clearer if every hook just provided the name it used for the database dump directory directly instead of changing it for every new hook that comes along. I would prefer it if the hook api was just better documented and integrated less into the rest of the tool. Imho writing a hook should not require reading all of borgmatics source code.
Author
Contributor

I would seriously propose writing a clear api for the database dumps since I would assume there a a lot more coming soon. If you were up for that change I would propose writing something like an abstract base class for the database hooks and documenting it very well.

I would seriously propose writing a clear api for the database dumps since I would assume there a a lot more coming soon. If you were up for that change I would propose writing something like an abstract base class for the database hooks and documenting it very well.
Owner

I agree that a more well-defined API would be preferable to the status quo. No API though is going to be perfectly prescient at designing for all future needs.. I think the trick is just getting started with an initial API that can meet the needs of current database hooks and your current Podman hook.

Something like a well-documented module with a bunch of empty functions could maybe serve that need, as borgmatic's code base doesn't really use classes (except where necessary to interface with the Python standard library / third-party libraries). I'd be happy to tackle that unless you wanted to take a stab at it as part of this Podman work.

I agree that a more well-defined API would be preferable to the status quo. No API though is going to be perfectly prescient at designing for all future needs.. I think the trick is just getting started with an initial API that can meet the needs of current database hooks and your current Podman hook. Something like a well-documented module with a bunch of empty functions could maybe serve that need, as borgmatic's code base doesn't really use classes (except where necessary to interface with the Python standard library / third-party libraries). I'd be happy to tackle that unless you wanted to take a stab at it as part of this Podman work.
Author
Contributor

Yeah I wouldn't mind working that out too. Kinda seems like wasted effort to try and hack in support for this one hook only to change the API afterwards.

Yeah I wouldn't mind working that out too. Kinda seems like wasted effort to try and hack in support for this one hook only to change the API afterwards.
Owner

Sounds good. Let me know if I can help with the API design—or if you just get sick of it and want to hand it over at any point!

Sounds good. Let me know if I can help with the API design—or if you just get sick of it and want to hand it over at any point!
Author
Contributor

Ok I have run into an issue I don't really know how to fix: If I create an empty volume using podman and dump it into a fifo podman creates an empty backup and following that instantly exits. borg does however try to read from the fifo and gets stuck since there is nothing to read. Can we handle that somehow?

Ok I have run into an issue I don't really know how to fix: If I create an empty volume using podman and dump it into a fifo podman creates an empty backup and following that instantly exits. `borg` does however try to read from the fifo and gets stuck since there is nothing to read. Can we handle that somehow?
Owner

Could you provide some more details about how you're dumping the volume? For instance, if you're exporting it with podman volume export, I would expect that to produce a tarball. And any tarball has a non-zero size even if its contents are empty.

Could you provide some more details about how you're dumping the volume? For instance, if you're exporting it with `podman volume export`, I would expect that to produce a tarball. And any tarball has a non-zero size even if its contents are empty.
Author
Contributor

Yeah no I really can't. I'm unfortunately really busy right now but I think the issue must be with my dev env since the podman volume export command does indeed not create an empty archive out of an empty volume.

Yeah no I really can't. I'm unfortunately really busy right now but I think the issue must be with my dev env since the podman volume export command does indeed not create an empty archive out of an empty volume.
Owner

Okay, I'm happy to help once you get some time to look at this. But if you aren't going to have time for the foreseeable, I totally understand.. Just let me know.

Okay, I'm happy to help once you get some time to look at this. But if you aren't going to have time for the foreseeable, I totally understand.. Just let me know.
Owner

I'll close this for now due to inactivity (and branch conflicts), but I'd be happy to revisit in the future. Thanks!

I'll close this for now due to inactivity (and branch conflicts), but I'd be happy to revisit in the future. Thanks!
witten closed this pull request 2025-02-11 06:38:22 +00:00

Pull request closed

Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#895
No description provided.