borg backup in pull mode #346

Closed
opened 2020-07-31 02:34:41 +00:00 by networkjanitor · 6 comments
Contributor

What I'm trying to do and why

Based on #99 I was trying to backup a remote server by mounting it locally via sshfs. In #99 it seems are old steps from the docs, which do not consider varying UIDs/GIDs on remote and backup server.

The current pull-backup docs recommend following steps (pre-backup):

# Mount client root file system.
mkdir /tmp/sshfs
sshfs root@host:/ /tmp/sshfs
# Mount BorgBackup repository inside it.
mkdir /tmp/sshfs/borgrepo
mount --bind /path/to/repo /tmp/sshfs/borgrepo
# Make borg executable available.
cp /usr/local/bin/borg /tmp/sshfs/usr/local/bin/borg
# Mount important system directories and enter chroot.
cd /tmp/sshfs
for i in dev proc sys; do mount --bind /$i $i; done
chroot /tmp/sshfs

https://borgbackup.readthedocs.io/en/stable/deployment/pull-backup.html#creating-a-backup

This is however seemingly currently not possible to configure in borgmatic, since each command in a hook is executed in a separate shell and therefore it's not possible to chroot into the remote server sshfs before borg is ran (just hangs at chroot, spawning a shell)

Other notes / implementation ideas

A quick glance at python docs and playing around with it, seems to indicate that there is no way to chroot inside a subprocess and keep it active within the subprocess-shell for later commands.

os.chroot() and subprocess afterwards has the desired effect, but would require additional at least one more configuration variable (chroot path) in the borgmatic config. Would probably also qualify as "can of worms", since there is no configuration option when the chroot should be activated (before or after hooks?).

The easiest way would probably be to skip the chroot and enable numeric_owner, so that the remote /etc/group and /etc/passwd would not have to be available.

Environment

borgmatic version: 1.5.4

borgmatic installation method: Arch Package

Borg version: borg 1.1.11

Python version: Python 3.8.3

#### What I'm trying to do and why Based on #99 I was trying to backup a remote server by mounting it locally via sshfs. In #99 it seems are old steps from the docs, which do not consider varying UIDs/GIDs on remote and backup server. The current pull-backup docs recommend following steps (pre-backup): ``` # Mount client root file system. mkdir /tmp/sshfs sshfs root@host:/ /tmp/sshfs # Mount BorgBackup repository inside it. mkdir /tmp/sshfs/borgrepo mount --bind /path/to/repo /tmp/sshfs/borgrepo # Make borg executable available. cp /usr/local/bin/borg /tmp/sshfs/usr/local/bin/borg # Mount important system directories and enter chroot. cd /tmp/sshfs for i in dev proc sys; do mount --bind /$i $i; done chroot /tmp/sshfs ``` https://borgbackup.readthedocs.io/en/stable/deployment/pull-backup.html#creating-a-backup This is however seemingly currently not possible to configure in borgmatic, since each command in a hook is executed in a separate shell and therefore it's not possible to chroot into the remote server sshfs before borg is ran (just hangs at chroot, spawning a shell) #### Other notes / implementation ideas A quick glance at python docs and playing around with it, seems to indicate that there is no way to chroot inside a subprocess and keep it active within the subprocess-shell for later commands. `os.chroot()` and subprocess afterwards has the desired effect, but would require additional at least one more configuration variable (chroot path) in the borgmatic config. Would probably also qualify as "can of worms", since there is no configuration option when the chroot should be activated (before or after hooks?). The easiest way would probably be to skip the chroot and enable `numeric_owner`, so that the remote /etc/group and /etc/passwd would not have to be available. #### Environment **borgmatic version:** 1.5.4 **borgmatic installation method:** Arch Package **Borg version:** borg 1.1.11 **Python version:** Python 3.8.3
Owner

I agree that figuring out how to make a chroot active sounds like a can of worms. :) But your comment here is interesting:

The easiest way would probably be to skip the chroot and enable numeric_owner, so that the remote /etc/group and /etc/passwd would not have to be available.

borgmatic does support a numeric_owner option (in the location) section. So would that work for your use case? Or is there still a gap?

Also, could you say a little more about your pull-based backup use case? Is it not convenient to run Borg directly on the server you need backed up? Or does the server to backup not have direct access to the destination backup server?

Thanks!

I agree that figuring out how to make a chroot active sounds like a can of worms. :) But your comment here is interesting: > The easiest way would probably be to skip the chroot and enable numeric_owner, so that the remote /etc/group and /etc/passwd would not have to be available. borgmatic does support a `numeric_owner` option (in the `location`) section. So would that work for your use case? Or is there still a gap? Also, could you say a little more about your pull-based backup use case? Is it not convenient to run Borg directly on the server you need backed up? Or does the server to backup not have direct access to the destination backup server? Thanks!
Author
Contributor

Enabling numeric_owner mostly solves the issue, if one can ensure that the remote server is always provisioned with the same uids/gids and that the uids/gids don't change between backups (both only relevant for partial-restores, in a full restore the /etc/passwd and /etc/group are overwritten anyway). This is not a problem for me.

Another gap between chroot and numeric_owner is, that the borg repository includes the leading path from where the sshfs is mounted (e.g. /tmp/sshfs). Shouldn't be much of a problem, since the path should always be the same on the backup server. If the path changes, then it will become a problem, which might be fixed by additonally bind-mounting the sshfs to the old location as well. And in the unlikely case of restoring directly to the server (without sshfs/borgmatic), then borg extract has --strip-components NUMBER which removes the specified number of leading path elements.

Unrelated to chroot, but not having a before_restore and after_restore hook in borgmatic is a problem, since there is no hook which would mount and unmount the sshfs before and after extracting then.

Regarding the pull-based backup use case: The servers to backup do not have access to the destination backup server. The servers to backup are VPS's and the destination backup server are running on my home network (behind NAT, firewall and a daily changing IP address).

Creating the backups directly on the servers and then rsync-ing them away is not possible, because there is not enough storage available on the servers. Backing up to a cloud storage provider is economically not feasible with the amount of data.

Enabling `numeric_owner` mostly solves the issue, if one can ensure that the remote server is always provisioned with the same uids/gids and that the uids/gids don't change between backups (both only relevant for partial-restores, in a full restore the /etc/passwd and /etc/group are overwritten anyway). This is not a problem for me. Another gap between chroot and `numeric_owner` is, that the borg repository includes the leading path from where the sshfs is mounted (e.g. `/tmp/sshfs`). Shouldn't be much of a problem, since the path should always be the same on the backup server. If the path changes, then it will become a problem, which might be fixed by additonally bind-mounting the sshfs to the old location as well. And in the unlikely case of restoring directly to the server (without sshfs/borgmatic), then borg extract has `--strip-components NUMBER` which removes the specified number of leading path elements. Unrelated to chroot, but not having a `before_restore` and `after_restore` hook in borgmatic is a problem, since there is no hook which would mount and unmount the sshfs before and after extracting then. Regarding the pull-based backup use case: The servers to backup do not have access to the destination backup server. The servers to backup are VPS's and the destination backup server are running on my home network (behind NAT, firewall and a daily changing IP address). Creating the backups directly on the servers and then rsync-ing them away is not possible, because there is not enough storage available on the servers. Backing up to a cloud storage provider is economically not feasible with the amount of data.
Author
Contributor

That being said, maybe a reverse SSH tunnel might be a better idea, as it would also allow for database backups. I'll experiment with that in the upcoming days, the extract hooks should still be useful though.

That being said, maybe a reverse SSH tunnel might be a better idea, as it would also allow for database backups. I'll experiment with that in the upcoming days, the extract hooks should still be useful though.
Author
Contributor

Using reverse SSH tunnels is definitely easier for my use case.

On the backupserver create a script which creates the SSH tunnel to the server to backup and calls borgmatic:

$ cat borgmatic-remote-servertobackup.sh 
ssh -R 22222:localhost:22 user@servertobackup -p 22 borgmatic -c borgmatic.yml $*

On the server to backup install borgmatic and borg and deploy the borgmatic config. If you have only one repo in the config file, then you can get away with putting the port and key spec into the borgmatic config:

$ cat test.yml      
location:
    source_directories:
        - /home/
    repositories:
        - borguser@localhost:borg-repo-test
storage:
    ssh_command: ssh -i ~/.ssh/id_ed25519 -p 22222
[...]

If not, then you need put the different ports and keys into the ~/.ssh/config (and probably also add soft-fails to check if which tunnels to which repos exist, the docs definitely have something about that already)

On the backupserver you can then call the created script (borgmatic-remote-servertobackup.sh) via systemd timer or cron and let it create backups for you. All the scheduling needs to happen on the backupserver, since this is the one which initiates the connection.

Since the script just relays all the parameters to the remote borgmatic, you can create the usual timers for create + prune and separate check quite easily with the same script. This also helps if you manually want to work with the repo (info/extract).

It also helps to create a dedicated user on the server to backup who can execute borgmatic via passwordless sudo (if the sources require that).

Are there any cases in which reverse SSH tunnels would not be the prefered way of doing "pull-based" backups? Possibly only for servers/devices on which you cannot run borg and/or borgmatic but which you can mount via sshfs.

Using reverse SSH tunnels is definitely easier for my use case. On the backupserver create a script which creates the SSH tunnel to the server to backup and calls borgmatic: ``` $ cat borgmatic-remote-servertobackup.sh ssh -R 22222:localhost:22 user@servertobackup -p 22 borgmatic -c borgmatic.yml $* ``` On the server to backup install borgmatic and borg and deploy the borgmatic config. If you have only one repo in the config file, then you can get away with putting the port and key spec into the borgmatic config: ``` $ cat test.yml location: source_directories: - /home/ repositories: - borguser@localhost:borg-repo-test storage: ssh_command: ssh -i ~/.ssh/id_ed25519 -p 22222 [...] ``` If not, then you need put the different ports and keys into the `~/.ssh/config` (and probably also add soft-fails to check if which tunnels to which repos exist, the docs definitely have something about that already) On the backupserver you can then call the created script (`borgmatic-remote-servertobackup.sh`) via systemd timer or cron and let it create backups for you. All the scheduling needs to happen on the backupserver, since this is the one which initiates the connection. Since the script just relays all the parameters to the remote borgmatic, you can create the usual timers for create + prune and separate check quite easily with the same script. This also helps if you manually want to work with the repo (info/extract). It also helps to create a dedicated user on the server to backup who can execute borgmatic via passwordless sudo (if the sources require that). Are there any cases in which reverse SSH tunnels would not be the prefered way of doing "pull-based" backups? Possibly only for servers/devices on which you cannot run borg and/or borgmatic but which you can mount via sshfs.
Owner

This looks like a great way to do pull-based backups without having to mess around with chroot! Thank you for writing it up.

A couple of thoughts:

If you have only one repo in the config file, then you can get away with putting the port and key spec into the borgmatic config:
...
If not, then you need put the different ports and keys into the ~/.ssh/config (and probably also add soft-fails to check if which tunnels to which repos exist, the docs definitely have something about that already)

Another option would be to use separate borgmatic configuration files. But using ~/.ssh/config would probably be easier. And also more in keeping with how SSH is meant to be used.

Are there any cases in which reverse SSH tunnels would not be the prefered way of doing “pull-based” backups? Possibly only for servers/devices on which you cannot run borg and/or borgmatic but which you can mount via sshfs.

Yeah, it seems like an SSH tunnel work pretty well for a variety of use cases. Good find!

This looks like a great way to do pull-based backups without having to mess around with chroot! Thank you for writing it up. A couple of thoughts: > If you have only one repo in the config file, then you can get away with putting the port and key spec into the borgmatic config: > ... > If not, then you need put the different ports and keys into the ~/.ssh/config (and probably also add soft-fails to check if which tunnels to which repos exist, the docs definitely have something about that already) Another option would be to use [separate borgmatic configuration files](https://torsion.org/borgmatic/docs/how-to/make-per-application-backups/). But using `~/.ssh/config` would probably be easier. And also more in keeping with how SSH is meant to be used. > Are there any cases in which reverse SSH tunnels would not be the prefered way of doing “pull-based” backups? Possibly only for servers/devices on which you cannot run borg and/or borgmatic but which you can mount via sshfs. Yeah, it seems like an SSH tunnel work pretty well for a variety of use cases. Good find!
Owner

Since it sounds like you have a solution, I'll close this ticket for now. Please feel free to add any other improvements though in comments!

Since it sounds like you have a solution, I'll close this ticket for now. Please feel free to add any other improvements though in comments!
witten changed title from borg backup in pull mode with chroot to borg backup in pull mode 2020-08-13 16:29:05 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#346
No description provided.