New hook: podman volume export #671
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I'm trying to backup (as a user) podman volumes.
Borg can't access the dir where volumes are stored. But it's possible to export volumes with
podman volume export VOLNAME
.There is a hook like
before_create
where I could export the volume in a place where borg could read.But borgmatic has a builtin feature where it can read data from a script's stdout, without consuming additional disk space on the source host? (as with borgmatic's databases handling). It seems that generic hooks can't do that though, and specific "database" dumps have to be implemented in order to use stdin/stdout.
So the idea would be to add support for reading
podman volume export VOLNAME
stdout and back it up.Environment
borgmatic version: 1.5.12
Use
sudo borgmatic --version
orsudo pip show borgmatic | grep ^Version
borgmatic installation method: Debian package
Borg version: 1.11.16
Use
sudo borg --version
Python version: 3.9.2
Use
python3 --version
operating system and version: Debian stable
Thank you for filing this. This sounds like a valuable feature for Podman users. How are you thinking that borgmatic would find out the volumes to export though? A couple of different ideas off the top of my head:
A question though about the need for this: With Docker, volumes are typically mounted from paths on the host, so it makes sense to point borgmatic directly at those host paths rather than going through any Docker volume machinery. (But if borgmatic itself is running in a container, the host paths in question can get mounted into the borgmatic container as volumes and read from the container's filesystem.)
So does that approach not work with Podman? Is there a use case where it's easier to access volumes with
podman volume export
rather than just reading the paths off the filesystem on either the host or in a container? You mentioned the ability to stream data directly frompodman volume export
so as to avoid a temporary copy. But if the volume data already exists on a filesystem, then no temporary copy is needed in order to read it. I feel like I must be missing something important here.That's how I was expecting it, giving the names of volumes. IMHO simple enough, less opaque/random that having to introspect the containers, and not too low-level.
I'm not very familiar with low-level details about docker and podman, but my naive understanding about docker is that the volume directories/files are owned by the docker daemon user, not the final user running docker.
About podman, for reasons that are beyond me, borg can't open the volume directory, even a simple
ls
fails with EACCESS.I'm running podman containers and
ls
under the same user (not root). The uid/gid of the directory are not mine, it might have to do with/etc/subuid
but this is black magic to me. At least,podman volume export
exports data as a tar file without error.I've read that
podman unshare
could help to access the direct volume directory but I'm not sure it can be done in the middle of a backup.Of course, running borgmatic as root would probably solve the problem, but I think it's better not to run it as root.
Got it. Thanks for the explanation. So when you run your container to begin with, are you running it with
--volume /some/host/path:/container/path
? Or does it look more like--volume some-volume:/container/path
?One other question on this: What would you expect the restore story would look like for Podman volumes?
podman volume import
? Would you do that manually, e.g.borgmatic extract
orborgmatic mount
to retrieve a volume tarball to import it yourself, or would you want borgmatic to do all of it including the import? Maybe something like this database restore process?I'm interested in adding this for podman. I think it's in the spirit of the podman project to not use the podman socket (mainly because thats a large point of why podman exists; it's daemonless). I also think that there is a lot of integration with the podman cli to be done, but that should work quite well since all of the needed cli commands support json output as far as I know. I am also interested in adding support for stopping/starting the containers (podman export is a dumb tar and not atomic), using labels on containers or labels on volumes.
TODO for backing up data:
podman volume inspect
andpodman inspect
podman volume export
and pipe that into borg like we do with database dumpsTODO for restoring data:
Either really simple:
podman volume import
Or add option to replace existing volume:
podman volume inspect
andpodman inspect
podman volume inport
I have started to read the docker docs and it looks like there is no way to do this via the socket. This is everything the current volume api supports over the socket.
I also haven't had a chance to fully look at this yet, but some initial thoughts:
Yes we could spin up a ad hoc container, dump the contents to stdout and back that up like a database dump but that adds a lot of complexity that is really unneeded. We would basically roll our own podman volume export instead of just using what already exists. For docker using the socket would make sense, I agree but I feel like not every feature has to work inside a container. I don't see any harm in shipping the feature in a way that it supports both using the podman cli or the docker socket if someone is willing to implement all that on top of the docker socket api.
Would you be opposed to having a cli and a socket implementation @witten ? I think we should look at something like containers/podman-py for the socket implementation and not roll our own interface since that should be way more stable.
Okay, I've had a chance to look at this in a little more detail. Thanks for your patience.
Here's the challenge I'm facing. We've got at least four distinct configurations:
My initial intent in suggesting that borgmatic talk to a socket is that, in theory, the same code could support all four configurations. If borgmatic is running on the host, it can talk to the Podman or Docker socket REST API, probably using the Docker compatibility API for Podman (although not necessarily). If borgmatic is running in a container, it can do the exact same thing, trusting that the socket has been mounted into it.
The problem with shelling out to a Podman/Docker binary is that it eliminates support for two of the configurations (running in a container). And the problem with using something like podman-py, while it does sound nice and convenient, is that it also eliminates support for two of the configurations (anything Docker). What do you think of using something like docker-py which can in theory address both?
Having said all that, I'm not necessarily averse to having separate Podman and Docker hooks if that really makes the most sense in terms of implementation. Or separate container CLI and socket hooks (that each support both Podman and Docker). I think I'd probably draw the line at four different hooks for the four different configurations. (And once you add in containers and volumes it's almost like eight configurations!)
Other thoughts: Reading optional labels on containers to decide whether to stop/start containers before/after backup makes sense to me, although I'd almost consider that something that can be layered on as an enhancement after the initial work.
Hope this helps!
No we really don't have 4 setups, we have two:
I think docker/podman should not make a lot of difference, at least the cli tools are compatible (at least for the part we care about: volume import/export). I am unsure how compatible the socket api is but even if they are not 100% compatible it would still be ok since we can just have the user configure if it's a docker or podman api socket we call.
I am interested in writing the hook for the cli but I would not like to commit a hook that supports both cli and socket as advent of code will probably keep me busy next to uni and work if thats fine by you. I would also start by writing a really primitive version and adding more features as we move along (mainly label based autostart/stop and label based volume backups) later on to keep the code manageable and not drop more than 1k loc in a pull request.
Ok the plan has changed a bit. Docker does not support this via the cli at all, so I think the cli hook can reasonably be renamed to be podman only (We should simply put a label on it saying: THIS IS NOT SUPPORTED INSIDE CONTAINERS, PLEASE USE THE SOCKET API IF YOU NEED TO RUN BORGMATIC IN A CONTAINER). This means we only now have to figure out how to do the socket calls in a docker/podman api compatible fashion for the socket option. The podman volume export/import hook is done but still needs some tests docs etc.
It looks like the compatible api should not be to hard as podman docs state "This documentation describes the Podman v2.x+ RESTful API. It consists of a Docker-compatible API (...)" So as long as we can restrict the scope of the api calls to those calls we should be fine.
Yeah, Docker doesn't have built-in volume export at the CLI (or at all, AFAIK).
That's fine to start with IMO. The two (CLI and socket) may ultimately make sense as separate hooks.
Makes sense to me. Start small with the minimum shippable unit of feature and then iterate later on. But will this initial effort only support backing up Podman volumes? Or will it support specifying Podman containers too?
Yeah, that Docker-compatible API is what I was referring to. Hopefully it has the functionality needed to manually export volume contents. Likely though that would require the trick where you spin up a temporary container and run
tar
or something.I'm starting to work on this right now. My current approach is to (for now) support volume names and volume labels. I will start out by doing volume names first but labels should not be much different since we should be able to get all the relevant names from this command.
Cool!
@witten would you mind if the restore command would just dump the volume to stdout? Podman volumes are just normal tar archives and I think it would be very flexibe to just run
borgmatic ... | podman volume import
rather than baking all of that into borgmatic directly. That would also allow exporting to a tar archive directly.My personal preference is to bake volume restoration into the existing
restore
action, as that way seems more symmetric/complete than leaving it entirely up to the user. Having said that, if you only feel like tackling thepodman volume export
side in a PR, I'd happily accept that piece and then I could do the import side.As for extracting archived volume taballs to stdout or anywhere else—the existing
export-tar
action should handle that use case, possibly without any changes! Example of extracting to stdout:(Although this might make a tarball of a tarball depending on how the volume export is stored.)
Thats not what I meant. Podman volume export generates the entire volume contents as a tar archive.
Yeah, so if you store Podman's tarball as-is inside the Borg archive, then
export-tar
would give you a double-tarred dump, which wouldn't be too useful. On the other hand, if you auto-untarred Podman's exported tarball and stored it as individual files in the Borg archive,export-tar
would give you a standard tar archive you could in theory import right back into Podman.Basically I would like the restore command to just act exactly like
podman volume export {volume_name}
Another option:
borg extract
has a--stdout
flag (currently unsupported by borgmatic, but that could be added pretty easily). So you could in theory store the Podman export tarball as-is inside the Borg archive, and then:Then you wouldn't end up with double-tarred output and could import it directly to Podman.
Yeah the plan is to store it as a tarball since we can just dump that to a named pipe and save storage that way
You could make a variant of the
restore
action that "restores" to stdout instead of directly to Podman. E.g. with a--stdout
flag or similar. But under the hood it would just be aborg extract --stdout
on the relevant archive and path.Yeah but podman volume import is kinda odd. You import a volume and the contents get merged with the current content of the volume. So either restore deletes the podman volume, creates it again and imports afterwards or restore takes the odd behavior of podman volume import.
Interesting.. I could see an argument for supporting both use cases ultimately. But it wouldn't necessarily have to for an initial version.