New hook: podman volume export #671
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#671
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I'm trying to backup (as a user) podman volumes.
Borg can't access the dir where volumes are stored. But it's possible to export volumes with
podman volume export VOLNAME
.There is a hook like
before_create
where I could export the volume in a place where borg could read.But borgmatic has a builtin feature where it can read data from a script's stdout, without consuming additional disk space on the source host? (as with borgmatic's databases handling). It seems that generic hooks can't do that though, and specific "database" dumps have to be implemented in order to use stdin/stdout.
So the idea would be to add support for reading
podman volume export VOLNAME
stdout and back it up.Environment
borgmatic version: 1.5.12
Use
sudo borgmatic --version
orsudo pip show borgmatic | grep ^Version
borgmatic installation method: Debian package
Borg version: 1.11.16
Use
sudo borg --version
Python version: 3.9.2
Use
python3 --version
operating system and version: Debian stable
Thank you for filing this. This sounds like a valuable feature for Podman users. How are you thinking that borgmatic would find out the volumes to export though? A couple of different ideas off the top of my head:
A question though about the need for this: With Docker, volumes are typically mounted from paths on the host, so it makes sense to point borgmatic directly at those host paths rather than going through any Docker volume machinery. (But if borgmatic itself is running in a container, the host paths in question can get mounted into the borgmatic container as volumes and read from the container's filesystem.)
So does that approach not work with Podman? Is there a use case where it's easier to access volumes with
podman volume export
rather than just reading the paths off the filesystem on either the host or in a container? You mentioned the ability to stream data directly frompodman volume export
so as to avoid a temporary copy. But if the volume data already exists on a filesystem, then no temporary copy is needed in order to read it. I feel like I must be missing something important here.That's how I was expecting it, giving the names of volumes. IMHO simple enough, less opaque/random that having to introspect the containers, and not too low-level.
I'm not very familiar with low-level details about docker and podman, but my naive understanding about docker is that the volume directories/files are owned by the docker daemon user, not the final user running docker.
About podman, for reasons that are beyond me, borg can't open the volume directory, even a simple
ls
fails with EACCESS.I'm running podman containers and
ls
under the same user (not root). The uid/gid of the directory are not mine, it might have to do with/etc/subuid
but this is black magic to me. At least,podman volume export
exports data as a tar file without error.I've read that
podman unshare
could help to access the direct volume directory but I'm not sure it can be done in the middle of a backup.Of course, running borgmatic as root would probably solve the problem, but I think it's better not to run it as root.
Got it. Thanks for the explanation. So when you run your container to begin with, are you running it with
--volume /some/host/path:/container/path
? Or does it look more like--volume some-volume:/container/path
?One other question on this: What would you expect the restore story would look like for Podman volumes?
podman volume import
? Would you do that manually, e.g.borgmatic extract
orborgmatic mount
to retrieve a volume tarball to import it yourself, or would you want borgmatic to do all of it including the import? Maybe something like this database restore process?I'm interested in adding this for podman. I think it's in the spirit of the podman project to not use the podman socket (mainly because thats a large point of why podman exists; it's daemonless). I also think that there is a lot of integration with the podman cli to be done, but that should work quite well since all of the needed cli commands support json output as far as I know. I am also interested in adding support for stopping/starting the containers (podman export is a dumb tar and not atomic), using labels on containers or labels on volumes.
TODO for backing up data:
podman volume inspect
andpodman inspect
podman volume export
and pipe that into borg like we do with database dumpsTODO for restoring data:
Either really simple:
podman volume import
Or add option to replace existing volume:
podman volume inspect
andpodman inspect
podman volume inport
I have started to read the docker docs and it looks like there is no way to do this via the socket. This is everything the current volume api supports over the socket.
I also haven't had a chance to fully look at this yet, but some initial thoughts:
Yes we could spin up a ad hoc container, dump the contents to stdout and back that up like a database dump but that adds a lot of complexity that is really unneeded. We would basically roll our own podman volume export instead of just using what already exists. For docker using the socket would make sense, I agree but I feel like not every feature has to work inside a container. I don't see any harm in shipping the feature in a way that it supports both using the podman cli or the docker socket if someone is willing to implement all that on top of the docker socket api.
Would you be opposed to having a cli and a socket implementation @witten ? I think we should look at something like containers/podman-py for the socket implementation and not roll our own interface since that should be way more stable.
Okay, I've had a chance to look at this in a little more detail. Thanks for your patience.
Here's the challenge I'm facing. We've got at least four distinct configurations:
My initial intent in suggesting that borgmatic talk to a socket is that, in theory, the same code could support all four configurations. If borgmatic is running on the host, it can talk to the Podman or Docker socket REST API, probably using the Docker compatibility API for Podman (although not necessarily). If borgmatic is running in a container, it can do the exact same thing, trusting that the socket has been mounted into it.
The problem with shelling out to a Podman/Docker binary is that it eliminates support for two of the configurations (running in a container). And the problem with using something like podman-py, while it does sound nice and convenient, is that it also eliminates support for two of the configurations (anything Docker). What do you think of using something like docker-py which can in theory address both?
Having said all that, I'm not necessarily averse to having separate Podman and Docker hooks if that really makes the most sense in terms of implementation. Or separate container CLI and socket hooks (that each support both Podman and Docker). I think I'd probably draw the line at four different hooks for the four different configurations. (And once you add in containers and volumes it's almost like eight configurations!)
Other thoughts: Reading optional labels on containers to decide whether to stop/start containers before/after backup makes sense to me, although I'd almost consider that something that can be layered on as an enhancement after the initial work.
Hope this helps!
No we really don't have 4 setups, we have two:
I think docker/podman should not make a lot of difference, at least the cli tools are compatible (at least for the part we care about: volume import/export). I am unsure how compatible the socket api is but even if they are not 100% compatible it would still be ok since we can just have the user configure if it's a docker or podman api socket we call.
I am interested in writing the hook for the cli but I would not like to commit a hook that supports both cli and socket as advent of code will probably keep me busy next to uni and work if thats fine by you. I would also start by writing a really primitive version and adding more features as we move along (mainly label based autostart/stop and label based volume backups) later on to keep the code manageable and not drop more than 1k loc in a pull request.
Ok the plan has changed a bit. Docker does not support this via the cli at all, so I think the cli hook can reasonably be renamed to be podman only (We should simply put a label on it saying: THIS IS NOT SUPPORTED INSIDE CONTAINERS, PLEASE USE THE SOCKET API IF YOU NEED TO RUN BORGMATIC IN A CONTAINER). This means we only now have to figure out how to do the socket calls in a docker/podman api compatible fashion for the socket option. The podman volume export/import hook is done but still needs some tests docs etc.
It looks like the compatible api should not be to hard as podman docs state "This documentation describes the Podman v2.x+ RESTful API. It consists of a Docker-compatible API (...)" So as long as we can restrict the scope of the api calls to those calls we should be fine.
Yeah, Docker doesn't have built-in volume export at the CLI (or at all, AFAIK).
That's fine to start with IMO. The two (CLI and socket) may ultimately make sense as separate hooks.
Makes sense to me. Start small with the minimum shippable unit of feature and then iterate later on. But will this initial effort only support backing up Podman volumes? Or will it support specifying Podman containers too?
Yeah, that Docker-compatible API is what I was referring to. Hopefully it has the functionality needed to manually export volume contents. Likely though that would require the trick where you spin up a temporary container and run
tar
or something.