Feature request: automatic configuration based on docker labels or environment #685

Open
opened 2023-04-21 12:18:18 +00:00 by acul009 · 5 comments

Disclaimer: I'm not sure this is possible, but I'm happy to help with a PR after I get some feedback if this is even wanted

Where I got the idea

Hi, I'm currently searching for a backup solution for my remote docker deployments. Borgmatic seems like a good solution, but I think it would greatly benefit from copying a few other good ideas.

For this reason I would like to take a look at 2 other pieces of software:

  1. camptocamp/bivac is a tool based on restic. It can read docker volumes and detect databses using a config file. If a database is found, bivac can sump it and backup that dump. Bivac can also automatically restore a dump and play it into the db.

  2. traefik is a reverse proxy, which can read it's configuration from docker labels.
    This way you can add your proxy configuration directly to your docker compose file

The Goal

My goal would be to backup all my compose projects by just adding a bit more information to the compose file. This way the backup configuration and the deployment would always be one and the same and could be found in the same place.

How this could look

All in One

This approach would add the functionality directly to borgmatic.
The software would scan all docker containers for labels. When a label requesting a backup is detected, it would add that container to it's configuration.

In case of a Database, Borgmatic could read the IP of the container and use that information to connect.
Additional information could either be extracted from the labels, or the container environment.
Detecting the database could either be done using an image or by running a command inside the container, like bivac does.
The dump itself yould also be made using a docker exec command, similar to bivac.
Files could be read by spinning up a sftp container with the volumes mounted

The orchestrator

As an alternative, I could see creating an additional program, which would scan the docker labels and generate a standart borgmatic config from them. This configuration would be fed into a borgmatic container with access to the volumes you want to back up for the application and with access to the required database

The DIY solution

This might alredy be possible, but you could allow specifying the configuration entirely from environment variables.
The go Framework "viper" allows this for applications by using a delimiter.

The config part

location:
   repositories:
     - path: common.borg

could then look like this:

BORGMATIC_LOCATION__REPOSITORIES__0__PATH = common.borg

This would probably be the easiest solution and would make deployments inside containers a lot easier and faster

Disclaimer: I'm not sure this is possible, but I'm happy to help with a PR after I get some feedback if this is even wanted #### Where I got the idea Hi, I'm currently searching for a backup solution for my remote docker deployments. Borgmatic seems like a good solution, but I think it would greatly benefit from copying a few other good ideas. For this reason I would like to take a look at 2 other pieces of software: 1) camptocamp/bivac is a tool based on restic. It can read docker volumes and detect databses using a config file. If a database is found, bivac can sump it and backup that dump. Bivac can also automatically restore a dump and play it into the db. 2) traefik is a reverse proxy, which can read it's configuration from docker labels. This way you can add your proxy configuration directly to your docker compose file #### The Goal My goal would be to backup all my compose projects by just adding a bit more information to the compose file. This way the backup configuration and the deployment would always be one and the same and could be found in the same place. #### How this could look ##### All in One This approach would add the functionality directly to borgmatic. The software would scan all docker containers for labels. When a label requesting a backup is detected, it would add that container to it's configuration. In case of a Database, Borgmatic could read the IP of the container and use that information to connect. Additional information could either be extracted from the labels, or the container environment. Detecting the database could either be done using an image or by running a command inside the container, like bivac does. The dump itself yould also be made using a docker exec command, similar to bivac. Files could be read by spinning up a sftp container with the volumes mounted ##### The orchestrator As an alternative, I could see creating an additional program, which would scan the docker labels and generate a standart borgmatic config from them. This configuration would be fed into a borgmatic container with access to the volumes you want to back up for the application and with access to the required database ##### The DIY solution This might alredy be possible, but you could allow specifying the configuration entirely from environment variables. The go Framework "viper" allows this for applications by using a delimiter. The config part ```yaml location: repositories: - path: common.borg ``` could then look like this: ``` BORGMATIC_LOCATION__REPOSITORIES__0__PATH = common.borg ``` This would probably be the easiest solution and would make deployments inside containers a lot easier and faster
Owner

Thanks for taking the time to file this and explain the background. I'm familiar with Traefik and its use of labels, and I'm a Compose user myself, but I haven't seen camptocamp/bivac before.

I think your general goal makes sense to me, and I could see borgmatic being expanded to have functionality like this. Although it wouldn't be without its challenges.

All in One

This approach would add the functionality directly to borgmatic.
The software would scan all docker containers for labels. When a label requesting a backup is detected, it would add that container to it's configuration.

Okay, so it would need access to the Docker socket.

In case of a Database, Borgmatic could read the IP of the container and use that information to connect.

Additional information could either be extracted from the labels, or the container environment.

Detecting the database could either be done using an image or by running a command inside the container, like bivac does.

The dump itself could also be made using a docker exec command, similar to bivac.

Interesting. So bivac docker execs into the container and runs pg_dump or equivalent directly there? How does it get the credentials to connect? Directly introspecting pg_hba.conf or whatever? Or does it rely on connecting as the postgres superuser? I'm just trying to imagine the mechanics.

Files could be read by spinning up a sftp container with the volumes mounted

Ah, so there's no way with Docker to export or read the contents of a container's volumes otherwise? Like Podman has a volume export command. Related: #671.

The orchestrator

As an alternative, I could see creating an additional program, which would scan the docker labels and generate a standard borgmatic config from them. This configuration would be fed into a borgmatic container with access to the volumes you want to back up for the application and with access to the required database

This might work, although I could see it being tough to marry the more "dynamic" generated configuration with configuration that you never want to change. Maybe includes could smooth that over. I'm not sure how I feel about auto-generated configuration though. It might be just as easy to not persist it and act on it directly—basically your previous approach.

The DIY solution

This might already be possible, but you could allow specifying the configuration entirely from environment variables.
The go Framework "viper" allows this for applications by using a delimiter.

Yeah, it's not entirely possible now, although you can do a certain amount with environment variables and command-line overrides. Doing an entire configuration like that seems pretty cumbersome to me though!

I will say that with any Docker-based approach, there are implications for downstream projects like borgmatic Docker images which may need to include Docker CLI binaries (and document mounting of the Docker socket) if they're to support this borgmatic functionality. Or even Podman CLI binaries... CC: @b3vis

In terms of a place to start, it might make sense to initially just bite off Docker volumes support and then worry about databases after.

Thanks for taking the time to file this and explain the background. I'm familiar with Traefik and its use of labels, and I'm a Compose user myself, but I haven't seen camptocamp/bivac before. I think your general goal makes sense to me, and I could see borgmatic being expanded to have functionality like this. Although it wouldn't be without its challenges. > ##### All in One > This approach would add the functionality directly to borgmatic. The software would scan all docker containers for labels. When a label requesting a backup is detected, it would add that container to it's configuration. Okay, so it would need access to the Docker socket. > In case of a Database, Borgmatic could read the IP of the container and use that information to connect. > > Additional information could either be extracted from the labels, or the container environment. > > Detecting the database could either be done using an image or by running a command inside the container, like bivac does. > > The dump itself could also be made using a docker exec command, similar to bivac. Interesting. So bivac docker execs into the container and runs `pg_dump` or equivalent directly there? How does it get the credentials to connect? Directly introspecting `pg_hba.conf` or whatever? Or does it rely on connecting as the `postgres` superuser? I'm just trying to imagine the mechanics. > Files could be read by spinning up a sftp container with the volumes mounted Ah, so there's no way with Docker to export or read the contents of a container's volumes otherwise? Like Podman has a `volume export` command. Related: #671. > ##### The orchestrator > As an alternative, I could see creating an additional program, which would scan the docker labels and generate a standard borgmatic config from them. This configuration would be fed into a borgmatic container with access to the volumes you want to back up for the application and with access to the required database This might work, although I could see it being tough to marry the more "dynamic" generated configuration with configuration that you never want to change. Maybe includes could smooth that over. I'm not sure how I feel about auto-generated configuration though. It might be just as easy to not persist it and act on it directly—basically your previous approach. > ##### The DIY solution > This might already be possible, but you could allow specifying the configuration entirely from environment variables. The go Framework "viper" allows this for applications by using a delimiter. Yeah, it's not entirely possible now, although you can do a certain amount with [environment variables](https://torsion.org/borgmatic/docs/how-to/provide-your-passwords/) and [command-line overrides](https://torsion.org/borgmatic/docs/how-to/make-per-application-backups/#configuration-overrides). Doing an entire configuration like that seems pretty cumbersome to me though! I will say that with any Docker-based approach, there are implications for downstream projects like [borgmatic Docker images](https://github.com/borgmatic-collective/docker-borgmatic) which may need to include Docker CLI binaries (and document mounting of the Docker socket) if they're to support this borgmatic functionality. Or even Podman CLI binaries... CC: @b3vis In terms of a place to start, it might make sense to initially just bite off Docker volumes support and then worry about databases after.
Owner

Files could be read by spinning up a sftp container with the volumes mounted

Ah, so there's no way with Docker to export or read the contents of a container's volumes otherwise? Like Podman has a volume export command. Related: #671.

This looks like the "official" way to do it: https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes

So maybe what you're getting at is dynamically running something like:

docker run --rm --volumes-from container-to-backup:ro some-sftp-image

EDIT: I might even suggest ditching sftp and doing something like this instead:

docker run -it --rm --volumes-from container-to-backup:ro alpine tar cv /mounted/volume/path

(Looks like volumes can kinda-sorta be detected within the container by looking for mounts matching ^/dev/ that are read-only.)

Then that'll stream the volume contents tarball to stdout, which borgmatic can redirect to a named pipe that Borg can consume from the filesystem at its leisure. This is very similar to the approach borgmatic uses with database dumps.

It also makes me wonder what the restore story is like with an approach like this. Not just the underlying mechanics (could just be the reverse of above), but the UX for restoring a container or a file in a container from backups....

> > Files could be read by spinning up a sftp container with the volumes mounted > > Ah, so there's no way with Docker to export or read the contents of a container's volumes otherwise? Like Podman has a volume export command. Related: #671. This looks like the "official" way to do it: https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes So maybe what you're getting at is dynamically running something like: ```bash docker run --rm --volumes-from container-to-backup:ro some-sftp-image ``` EDIT: I might even suggest ditching sftp and doing something like this instead: ```bash docker run -it --rm --volumes-from container-to-backup:ro alpine tar cv /mounted/volume/path ``` (Looks like volumes can kinda-sorta be detected within the container by looking for mounts matching `^/dev/` that are read-only.) Then that'll stream the volume contents tarball to stdout, which borgmatic can redirect to a named pipe that Borg can consume from the filesystem at its leisure. This is very similar to the approach borgmatic uses with database dumps. It also makes me wonder what the restore story is like with an approach like this. Not just the underlying mechanics (could just be the reverse of above), but the UX for restoring a container or a file in a container from backups....
Author

Your tar container is a way better and easier solution than I would had in mind.

As for the database credentials, these are usually included within the container, most of the time in the environment. A cump command could look like this:

pg_dumpall --clean -U $POSTGRES_USER > $volume/backups/all.sql"

I've just notized, that I seem to have misunderstood borg and borgmatic a bit. My current storage provider only exposes s3, which borg doesn't seem to support.

Am I correct or is there a stable way to outsource my backups to my s3 storage?

Your tar container is a way better and easier solution than I would had in mind. As for the database credentials, these are usually included within the container, most of the time in the environment. A cump command could look like this: ``` pg_dumpall --clean -U $POSTGRES_USER > $volume/backups/all.sql" ``` I've just notized, that I seem to have misunderstood borg and borgmatic a bit. My current storage provider only exposes s3, which borg doesn't seem to support. Am I correct or is there a stable way to outsource my backups to my s3 storage?
Owner

As for the database credentials, these are usually included within the container, most of the time in the environment.

Interesting!

Am I correct or is there a stable way to outsource my backups to my s3 storage?

Borg does not support S3 storage directly. You can however backup with Borg to either a local repository or a remote repository via SSH, and then rclone that repository to the S3-compatible storage provider of your choice.

> As for the database credentials, these are usually included within the container, most of the time in the environment. Interesting! > Am I correct or is there a stable way to outsource my backups to my s3 storage? Borg [does not support S3 storage directly](https://github.com/borgbackup/borg/issues/102). You can however backup with Borg to either a local repository or a remote repository via SSH, and then [rclone](https://rclone.org/) that repository to the S3-compatible storage provider of your choice.
witten added this to the container backups milestone 2023-05-23 15:41:19 +00:00
Owner

Some design/implementation ideas:

hook API

The existing database dump/restore internal hook API has been generalized to support dumping and restoring "data sources" so that they can conceptually support this ticket (among others).

configuration

Consider a configuration approach something like this:

container_volumes:
    - name: my-volume
      socket: /var/run/docker.sock  # optional

And this:

container_volumes:
    - name: auto

This auto value would tell borgmatic to introspect running Podman/Docker containers, gathering those with a particular label (org.torsion.borgmatic.enable=true or whatever) and backing up any volumes mounted on those containers.

There's an open question about whether a container-based configuration approach would make more sense, especially since --volumes-from in Docker is container-oriented rather than volume-oriented. What this might look like:

containers:
    - name: my-container

That would backup all the volumes from the configured containers. And auto would apply here as well, looking for all containers with the expected label.

backup

Like with database backups, borgmatic could automatically dump configured container volumes upon the create action and include those volumes in the backup, perhaps using the named pipe streaming trick. For Podman, podman volume export makes this pretty easy. For Docker, it's a little trickier. See #685 for some ideas.

When auto is set (see configuration above), borgmatic would have to list containers with the right label and collect their volumes.

If borgmatic is using the named pipe trick, then borgmatic picks the path at which volume dumps get stored, presumably somewhere in ~/.borgmatic/container_volumes/ or whatever.

restore

For restore, perhaps the existing database-specific borgmatic restore could be generalized to support container volumes, too.

$ borgmatic restore --volume foo

Under the hood, this would podman volume import for Podman. Note that the volume has to already exist for this to work. For Docker, borgmatic could do the --volumes-from trick to get all the mounted volumes from an existing container and then extract into them. (That doesn't work though if a volume isn't mounted into a container and/or borgmatic doesn't know the container it's mounted into.)

socket vs. running commands

I propose that all of the above use the Docker/Podman socket rather than running docker/podman commands directly. That's primarily because borgmatic often runs in a container (and some of the #685 ideas require spawning containers), and trying to get Docker-in-Docker or Podman-in-Podman working reliably is not something I want to inflict on borgmatic users. Instead, users can "simply" mount the Docker/Podman socket into the borgmatic container (or run borgmatic on the host with appropriate permissions), and then borgmatic can talk to the socket API directly. This is also one less thing for container maintainers to install and get working.

Some design/implementation ideas: ### hook API The existing database dump/restore internal hook API has been generalized to support dumping and restoring "data sources" so that they can conceptually support this ticket (among others). ### configuration Consider a configuration approach something like this: ```yaml container_volumes: - name: my-volume socket: /var/run/docker.sock # optional ``` And this: ```yaml container_volumes: - name: auto ``` This `auto` value would tell borgmatic to introspect running Podman/Docker containers, gathering those with a particular label (`org.torsion.borgmatic.enable=true` or whatever) and backing up any volumes mounted on those containers. There's an open question about whether a container-based configuration approach would make more sense, especially since `--volumes-from` in Docker is container-oriented rather than volume-oriented. What this might look like: ```yaml containers: - name: my-container ``` That would backup all the volumes from the configured containers. And `auto` would apply here as well, looking for all containers with the expected label. ### backup Like with database backups, borgmatic could automatically dump configured container volumes upon the `create` action and include those volumes in the backup, perhaps using the named pipe streaming trick. For Podman, `podman volume export` makes this pretty easy. For Docker, it's a little trickier. See #685 for some ideas. When `auto` is set (see configuration above), borgmatic would have to list containers with the right label and collect their volumes. If borgmatic is using the named pipe trick, then borgmatic picks the path at which volume dumps get stored, presumably somewhere in `~/.borgmatic/container_volumes/` or whatever. ### restore For restore, perhaps the existing database-specific `borgmatic restore` could be generalized to support container volumes, too. ```shell $ borgmatic restore --volume foo ``` Under the hood, this would `podman volume import` for Podman. Note that the volume has to already exist for this to work. For Docker, borgmatic could do the `--volumes-from` trick to get all the mounted volumes from an existing container and then extract into them. (That doesn't work though if a volume _isn't_ mounted into a container and/or borgmatic doesn't know the container it's mounted into.) ### socket vs. running commands I propose that all of the above use the Docker/Podman socket rather than running `docker`/`podman` commands directly. That's primarily because borgmatic often runs in a container (and some of the #685 ideas require spawning containers), and trying to get Docker-in-Docker or Podman-in-Podman working reliably is not something I want to inflict on borgmatic users. Instead, users can "simply" mount the Docker/Podman socket into the borgmatic container (or run borgmatic on the host with appropriate permissions), and then borgmatic can talk to the socket API directly. This is also one less thing for [container maintainers](https://github.com/borgmatic-collective/docker-borgmatic) to install and get working.
witten added the
new feature area
label 2023-06-28 18:37:14 +00:00
witten self-assigned this 2023-08-27 17:38:35 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#685
No description provided.