New hook: podman volume export #671

Open
opened 2023-04-09 09:20:05 +00:00 by hydrargyrum · 14 comments

What I'm trying to do and why

I'm trying to backup (as a user) podman volumes.
Borg can't access the dir where volumes are stored. But it's possible to export volumes with podman volume export VOLNAME.

There is a hook like before_create where I could export the volume in a place where borg could read.

But borgmatic has a builtin feature where it can read data from a script's stdout, without consuming additional disk space on the source host? (as with borgmatic's databases handling). It seems that generic hooks can't do that though, and specific "database" dumps have to be implemented in order to use stdin/stdout.

So the idea would be to add support for reading podman volume export VOLNAME stdout and back it up.

Environment

borgmatic version: 1.5.12
Use sudo borgmatic --version or sudo pip show borgmatic | grep ^Version

borgmatic installation method: Debian package

Borg version: 1.11.16

Use sudo borg --version

Python version: 3.9.2

Use python3 --version

operating system and version: Debian stable

#### What I'm trying to do and why I'm trying to backup (as a user) podman volumes. Borg can't access the dir where volumes are stored. But it's possible to export volumes with `podman volume export VOLNAME`. There is a hook like `before_create` where I could export the volume in a place where borg could read. But borgmatic has a builtin feature where it can read data from a script's stdout, without consuming additional disk space on the source host? (as with borgmatic's databases handling). It seems that generic hooks can't do that though, and specific "database" dumps have to be implemented in order to use stdin/stdout. So the idea would be to add support for reading `podman volume export VOLNAME` stdout and back it up. #### Environment **borgmatic version:** 1.5.12 Use `sudo borgmatic --version` or `sudo pip show borgmatic | grep ^Version` **borgmatic installation method:** Debian package **Borg version:** 1.11.16 Use `sudo borg --version` **Python version:** 3.9.2 Use `python3 --version` **operating system and version:** Debian stable
Owner

Thank you for filing this. This sounds like a valuable feature for Podman users. How are you thinking that borgmatic would find out the volumes to export though? A couple of different ideas off the top of my head:

  • Like the existing database hooks, a theoretical Podman hook could be configured with something like:
hooks:
    podman_volumes:
        - name: yourvolume
  • Or would you expect borgmatic to introspect the needed volumes, perhaps from a list of configured containers?
  • Or did you have something else in mind?

A question though about the need for this: With Docker, volumes are typically mounted from paths on the host, so it makes sense to point borgmatic directly at those host paths rather than going through any Docker volume machinery. (But if borgmatic itself is running in a container, the host paths in question can get mounted into the borgmatic container as volumes and read from the container's filesystem.)

So does that approach not work with Podman? Is there a use case where it's easier to access volumes with podman volume export rather than just reading the paths off the filesystem on either the host or in a container? You mentioned the ability to stream data directly from podman volume export so as to avoid a temporary copy. But if the volume data already exists on a filesystem, then no temporary copy is needed in order to read it. I feel like I must be missing something important here.

Thank you for filing this. This sounds like a valuable feature for Podman users. How are you thinking that borgmatic would find out the volumes to export though? A couple of different ideas off the top of my head: * Like the existing database hooks, a theoretical Podman hook could be configured with something like: ```yaml hooks: podman_volumes: - name: yourvolume ``` * Or would you expect borgmatic to introspect the needed volumes, perhaps from a list of configured containers? * Or did you have something else in mind? A question though about the need for this: With Docker, volumes are typically mounted from paths on the host, so it makes sense to point borgmatic directly at those host paths rather than going through any Docker volume machinery. (But if borgmatic itself is running in a container, the host paths in question can get mounted into the borgmatic container as volumes and read from the container's filesystem.) So does that approach not work with Podman? Is there a use case where it's easier to access volumes with `podman volume export` rather than just reading the paths off the filesystem on either the host or in a container? You mentioned the ability to stream data directly from `podman volume export` so as to avoid a temporary copy. But if the volume data already exists on a filesystem, then no temporary copy is needed in order to read it. I feel like I must be missing something important here.
Author

Like the existing database hooks, a theoretical Podman hook could be configured with something like:

hooks:
    podman_volumes:
        - name: yourvolume

That's how I was expecting it, giving the names of volumes. IMHO simple enough, less opaque/random that having to introspect the containers, and not too low-level.

A question though about the need for this: With Docker, volumes are
typically mounted from paths on the host, so it makes sense to point
borgmatic directly at those host paths rather than going through any
Docker volume machinery. (But if borgmatic itself is running in a
container, the host paths in question can get mounted into the borgmatic
container as volumes and read from the container's filesystem.)

I'm not very familiar with low-level details about docker and podman, but my naive understanding about docker is that the volume directories/files are owned by the docker daemon user, not the final user running docker.

About podman, for reasons that are beyond me, borg can't open the volume directory, even a simple ls fails with EACCESS.

% podman inspect gitolite_git
[
     {
          "Name": "gitolite_git",
          "Driver": "local",
          "Mountpoint": "/xxx/podman/volumes/gitolite_git/_data",
[...]
% ls /xxx/podman/volumes/gitolite_git/_data
ls: cannot open directory '/xxx/podman/volumes/gitolite_git/_data': Permission denied

I'm running podman containers and ls under the same user (not root). The uid/gid of the directory are not mine, it might have to do with /etc/subuid but this is black magic to me. At least, podman volume export exports data as a tar file without error.

I've read that podman unshare could help to access the direct volume directory but I'm not sure it can be done in the middle of a backup.

Of course, running borgmatic as root would probably solve the problem, but I think it's better not to run it as root.

> Like the existing database hooks, a theoretical Podman hook could be configured with something like: ``` hooks: podman_volumes: - name: yourvolume ``` That's how I was expecting it, giving the names of volumes. IMHO simple enough, less opaque/random that having to introspect the containers, and not too low-level. > A question though about the need for this: With Docker, volumes are typically mounted from paths on the host, so it makes sense to point borgmatic directly at those host paths rather than going through any Docker volume machinery. (But if borgmatic itself is running in a container, the host paths in question can get mounted into the borgmatic container as volumes and read from the container's filesystem.) I'm not very familiar with low-level details about docker and podman, but my naive understanding about docker is that the volume directories/files are owned by the docker daemon user, not the final user running docker. About podman, for reasons that are beyond me, borg can't open the volume directory, even a simple `ls` fails with EACCESS. ``` % podman inspect gitolite_git [ { "Name": "gitolite_git", "Driver": "local", "Mountpoint": "/xxx/podman/volumes/gitolite_git/_data", [...] % ls /xxx/podman/volumes/gitolite_git/_data ls: cannot open directory '/xxx/podman/volumes/gitolite_git/_data': Permission denied ``` I'm running podman containers and `ls` under the same user (not root). The uid/gid of the directory are not mine, it might have to do with `/etc/subuid` but this is black magic to me. At least, `podman volume export` exports data as a tar file without error. I've read that `podman unshare` could help to access the direct volume directory but I'm not sure it can be done in the middle of a backup. Of course, running borgmatic as root would probably solve the problem, but I think it's better not to run it as root.
Owner

Got it. Thanks for the explanation. So when you run your container to begin with, are you running it with --volume /some/host/path:/container/path? Or does it look more like --volume some-volume:/container/path?

Got it. Thanks for the explanation. So when you run your container to begin with, are you running it with `--volume /some/host/path:/container/path`? Or does it look more like `--volume some-volume:/container/path`?
Owner

One other question on this: What would you expect the restore story would look like for Podman volumes? podman volume import? Would you do that manually, e.g. borgmatic extract or borgmatic mount to retrieve a volume tarball to import it yourself, or would you want borgmatic to do all of it including the import? Maybe something like this database restore process?

One other question on this: What would you expect the restore story would look like for Podman volumes? `podman volume import`? Would you do that manually, e.g. `borgmatic extract` or `borgmatic mount` to retrieve a volume tarball to import it yourself, or would you want borgmatic to do all of it including the import? Maybe something like [this database restore process](https://torsion.org/borgmatic/docs/how-to/backup-your-databases/#database-restoration)?
witten added this to the container backups milestone 2023-05-23 15:41:29 +00:00
witten added the
new feature area
label 2023-06-28 18:37:31 +00:00
Contributor

I'm interested in adding this for podman. I think it's in the spirit of the podman project to not use the podman socket (mainly because thats a large point of why podman exists; it's daemonless). I also think that there is a lot of integration with the podman cli to be done, but that should work quite well since all of the needed cli commands support json output as far as I know. I am also interested in adding support for stopping/starting the containers (podman export is a dumb tar and not atomic), using labels on containers or labels on volumes.

TODO for backing up data:

  • Parse podman volume inspect and podman inspect
  • Figure out which containers use which volumes
  • Stop the containers that use the volume
  • Run podman volume export and pipe that into borg like we do with database dumps

TODO for restoring data:
Either really simple:

  • Just run podman volume import

Or add option to replace existing volume:

  • Parse podman volume inspect and podman inspect
  • Figure out which containers use which volumes
  • Stop the containers that use the volume
  • Import the volume with podman volume inport
  • Start the containers
I'm interested in adding this for podman. I think it's in the spirit of the podman project to not use the podman socket (mainly because thats a large point of why podman exists; it's daemonless). I also think that there is a lot of integration with the podman cli to be done, but that should work quite well since all of the needed cli commands support json output as far as I know. I am also interested in adding support for stopping/starting the containers (podman export is a dumb tar and not atomic), using [labels on containers](https://docs.podman.io/en/latest/markdown/podman-run.1.html#label-l-key-value) or [labels on volumes](https://docs.podman.io/en/latest/markdown/podman-volume-create.1.html#label-l-label). TODO for backing up data: - [ ] Parse `podman volume inspect` and `podman inspect` - [ ] Figure out which containers use which volumes - [ ] Stop the containers that use the volume - [ ] Run `podman volume export` and pipe that into borg like we do with database dumps TODO for restoring data: Either really simple: - [ ] Just run `podman volume import` Or add option to replace existing volume: - [ ] Parse `podman volume inspect` and `podman inspect` - [ ] Figure out which containers use which volumes - [ ] Stop the containers that use the volume - [ ] Import the volume with `podman volume inport` - [ ] Start the containers
Contributor

I have started to read the docker docs and it looks like there is no way to do this via the socket. This is everything the current volume api supports over the socket.

I have started to read the docker docs and it looks like there is no way to do this via the socket. [This](https://docs.docker.com/engine/api/v1.43/#tag/Volume) is everything the current volume api supports over the socket.
Owner

I also haven't had a chance to fully look at this yet, but some initial thoughts:

I also haven't had a chance to fully look at this yet, but some initial thoughts: * [This comment](https://projects.torsion.org/borgmatic-collective/borgmatic/issues/685#issuecomment-6370) touches on a rationale for using the socket rather than CLI. * Yeah, the socket API won't do volume/container dumping directly. But you could [spin up a temporary container to dump a volume or a container to stdout](https://projects.torsion.org/borgmatic-collective/borgmatic/issues/685) (ideally with the socket rather than the CLI). I think this is actually similar to [the officially recommended Docker mechanism for backups](https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes). * Be aware of [these poll results](https://fosstodon.org/@borgmatic/110946967464871569).
Contributor

Yes we could spin up a ad hoc container, dump the contents to stdout and back that up like a database dump but that adds a lot of complexity that is really unneeded. We would basically roll our own podman volume export instead of just using what already exists. For docker using the socket would make sense, I agree but I feel like not every feature has to work inside a container. I don't see any harm in shipping the feature in a way that it supports both using the podman cli or the docker socket if someone is willing to implement all that on top of the docker socket api.

Yes we could spin up a ad hoc container, dump the contents to stdout and back that up like a database dump but that adds a lot of complexity that is really unneeded. We would basically roll our own podman volume export instead of just using what already exists. For docker using the socket would make sense, I agree but I feel like not every feature has to work inside a container. I don't see any harm in shipping the feature in a way that it supports both using the podman cli or the docker socket if someone is willing to implement all that on top of the docker socket api.
Contributor

Would you be opposed to having a cli and a socket implementation @witten ? I think we should look at something like containers/podman-py for the socket implementation and not roll our own interface since that should be way more stable.

Would you be opposed to having a cli and a socket implementation @witten ? I think we should look at something like [containers/podman-py](https://github.com/containers/podman-py) for the socket implementation and not roll our own interface since that should be way more stable.
Owner

Okay, I've had a chance to look at this in a little more detail. Thanks for your patience.

Here's the challenge I'm facing. We've got at least four distinct configurations:

  • borgmatic on the host running Podman
  • borgmatic in a Podman container
  • borgmatic on the host running Docker
  • borgmatic in a Docker container

My initial intent in suggesting that borgmatic talk to a socket is that, in theory, the same code could support all four configurations. If borgmatic is running on the host, it can talk to the Podman or Docker socket REST API, probably using the Docker compatibility API for Podman (although not necessarily). If borgmatic is running in a container, it can do the exact same thing, trusting that the socket has been mounted into it.

The problem with shelling out to a Podman/Docker binary is that it eliminates support for two of the configurations (running in a container). And the problem with using something like podman-py, while it does sound nice and convenient, is that it also eliminates support for two of the configurations (anything Docker). What do you think of using something like docker-py which can in theory address both?

Having said all that, I'm not necessarily averse to having separate Podman and Docker hooks if that really makes the most sense in terms of implementation. Or separate container CLI and socket hooks (that each support both Podman and Docker). I think I'd probably draw the line at four different hooks for the four different configurations. (And once you add in containers and volumes it's almost like eight configurations!)

Other thoughts: Reading optional labels on containers to decide whether to stop/start containers before/after backup makes sense to me, although I'd almost consider that something that can be layered on as an enhancement after the initial work.

Hope this helps!

Okay, I've had a chance to look at this in a little more detail. Thanks for your patience. Here's the challenge I'm facing. We've got at least four distinct configurations: * borgmatic on the host running Podman * borgmatic in a Podman container * borgmatic on the host running Docker * borgmatic in a Docker container My initial intent in suggesting that borgmatic talk to a socket is that, in theory, the same code could support all four configurations. If borgmatic is running on the host, it can talk to the Podman or Docker socket REST API, probably using the Docker compatibility API for Podman (although not necessarily). If borgmatic is running in a container, it can do the exact same thing, trusting that the socket has been mounted into it. The problem with shelling out to a Podman/Docker binary is that it eliminates support for two of the configurations (running in a container). And the problem with using something like podman-py, while it _does_ sound nice and convenient, is that it also eliminates support for two of the configurations (anything Docker). What do you think of using something like [docker-py](https://github.com/docker/docker-py) which can in theory address both? Having said all that, I'm not necessarily averse to having separate Podman and Docker hooks if that really makes the most sense in terms of implementation. Or separate container CLI and socket hooks (that each support both Podman and Docker). I think I'd probably draw the line at _four_ different hooks for the four different configurations. (And once you add in containers _and_ volumes it's almost like eight configurations!) Other thoughts: Reading optional labels on containers to decide whether to stop/start containers before/after backup makes sense to me, although I'd almost consider that something that can be layered on as an enhancement after the initial work. Hope this helps!
Contributor

No we really don't have 4 setups, we have two:

  • Inside a container
  • Outside a container
    I think docker/podman should not make a lot of difference, at least the cli tools are compatible (at least for the part we care about: volume import/export). I am unsure how compatible the socket api is but even if they are not 100% compatible it would still be ok since we can just have the user configure if it's a docker or podman api socket we call.
    I am interested in writing the hook for the cli but I would not like to commit a hook that supports both cli and socket as advent of code will probably keep me busy next to uni and work if thats fine by you. I would also start by writing a really primitive version and adding more features as we move along (mainly label based autostart/stop and label based volume backups) later on to keep the code manageable and not drop more than 1k loc in a pull request.
No we really don't have 4 setups, we have two: - Inside a container - Outside a container I think docker/podman should not make a lot of difference, at least the cli tools are compatible (at least for the part we care about: volume import/export). I am unsure how compatible the socket api is but even if they are not 100% compatible it would still be ok since we can just have the user configure if it's a docker or podman api socket we call. I am interested in writing the hook for the cli but I would not like to commit a hook that supports both cli and socket as advent of code will probably keep me busy next to uni and work if thats fine by you. I would also start by writing a really primitive version and adding more features as we move along (mainly label based autostart/stop and label based volume backups) later on to keep the code manageable and not drop more than 1k loc in a pull request.
Contributor

Ok the plan has changed a bit. Docker does not support this via the cli at all, so I think the cli hook can reasonably be renamed to be podman only (We should simply put a label on it saying: THIS IS NOT SUPPORTED INSIDE CONTAINERS, PLEASE USE THE SOCKET API IF YOU NEED TO RUN BORGMATIC IN A CONTAINER). This means we only now have to figure out how to do the socket calls in a docker/podman api compatible fashion for the socket option. The podman volume export/import hook is done but still needs some tests docs etc.

Ok the plan has changed a bit. Docker does not support this via the cli at all, so I think the cli hook can reasonably be renamed to be podman only (We should simply put a label on it saying: THIS IS NOT SUPPORTED INSIDE CONTAINERS, PLEASE USE THE SOCKET API IF YOU NEED TO RUN BORGMATIC IN A CONTAINER). This means we only now have to figure out how to do the socket calls in a docker/podman api compatible fashion for the socket option. The podman volume export/import hook is done but still needs some tests docs etc.
Contributor

It looks like the compatible api should not be to hard as podman docs state "This documentation describes the Podman v2.x+ RESTful API. It consists of a Docker-compatible API (...)" So as long as we can restrict the scope of the api calls to those calls we should be fine.

It looks like the compatible api should not be to hard as podman [docs](https://docs.podman.io/en/latest/_static/api.html) state "This documentation describes the Podman v2.x+ RESTful API. It consists of a Docker-compatible API (...)" So as long as we can restrict the scope of the api calls to those calls we should be fine.
Owner

Yeah, Docker doesn't have built-in volume export at the CLI (or at all, AFAIK).

I am interested in writing the hook for the cli but I would not like to commit a hook that supports both cli and socket as advent of code will probably keep me busy next to uni and work if thats fine by you.

That's fine to start with IMO. The two (CLI and socket) may ultimately make sense as separate hooks.

I would also start by writing a really primitive version and adding more features as we move along (mainly label based autostart/stop and label based volume backups) later on to keep the code manageable and not drop more than 1k loc in a pull request.

Makes sense to me. Start small with the minimum shippable unit of feature and then iterate later on. But will this initial effort only support backing up Podman volumes? Or will it support specifying Podman containers too?

It looks like the compatible api should not be to hard as podman docs state "This documentation describes the Podman v2.x+ RESTful API. It consists of a Docker-compatible API (...)" So as long as we can restrict the scope of the api calls to those calls we should be fine.

Yeah, that Docker-compatible API is what I was referring to. Hopefully it has the functionality needed to manually export volume contents. Likely though that would require the trick where you spin up a temporary container and run tar or something.

Yeah, Docker doesn't have built-in volume export at the CLI (or at all, AFAIK). > I am interested in writing the hook for the cli but I would not like to commit a hook that supports both cli and socket as advent of code will probably keep me busy next to uni and work if thats fine by you. That's fine to start with IMO. The two (CLI and socket) may ultimately make sense as separate hooks. > I would also start by writing a really primitive version and adding more features as we move along (mainly label based autostart/stop and label based volume backups) later on to keep the code manageable and not drop more than 1k loc in a pull request. Makes sense to me. Start small with the minimum shippable unit of feature and then iterate later on. But will this initial effort only support backing up Podman volumes? Or will it support specifying Podman containers too? > It looks like the compatible api should not be to hard as podman docs state "This documentation describes the Podman v2.x+ RESTful API. It consists of a Docker-compatible API (...)" So as long as we can restrict the scope of the api calls to those calls we should be fine. Yeah, that Docker-compatible API is what I was referring to. Hopefully it has the functionality needed to manually export volume contents. Likely though that would require the trick where you spin up a temporary container and run `tar` or something.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#671
No description provided.