Does BorgMatic support a write-through setup of 3 hosts with client and storage being remote? #584
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#584
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I'm currently backing up multiple different hosts and VMs remotely using a PULL approach and SSHFS, pretty much like officially documented by BorgBackup. The good thing of this setup is that I don't need any Borg* deployment in the remote and have one server coordinating all the backup times, all configs at one place, that backup server is the only one to have access to the external storage etc. The bad thing is that SSHFS isn't maintained anymore and might most likely vanish from distributions at some point. Additionally, there are some restrictions with INODEs, ACLs, extendes attributes and stuff.
So in my opinion the next best thing would be to have Borg itself deployed in the remote hosts, while still having one backup server taking care of everything else. Something like this is described using SOCAT in the official docs again. Maybe I'm missing something, but the SOCAT in the example is exactly what might better be BorgMatic in my opinion:
BorgMatic knows about what to backup at the client, knows about the backup storage and support hooks for preparing the client to be backed up. I use that already to create file system level snapshots in the client and mount those using SSHFS. This could easily be kept, only that mounting SSHFS is abandoned in favour of BorgMatic executing a Borg client instance in the remote host to be backed up. From my understanding it does execute that Borg client instance locally already, one "only" needs some SSH access to the remote machine instead.
Additionally, the Borg client MUST NOT connect to the storage on its own, especially not run
borg serve
or alike. The important thing in the end is that the client shouldn't have access to that storage. Instead, it should communicate through SSH with BorgMatic, which executedborg serve
on itself instead and forwards all messages from the Borg client and the Borg server instance to each other. This way it simply replaces what SOCAT does, but it's BorgMatic, so handles all of the hooks stuff, error handling, pruning etc. like it did before already. :-)How do Borg client and server instances communicate?
This is one important thing I'm not sure I correctly understood yet: Looking at the example of SOCAT and the docs about
borg serve
, I have the feeling that Borg client instances simply write to STDOUT to send messages to instances ofborg serve
?Who starts
borg serve
at all?Another thing Im not sure of is what BorgMatic executes on its own and what is executed by Borg client instances. Looking at the log, I find the following command lines which I think are executed by BorgMatic itself:
What I'm not sure about is the following statement in the logs:
Who executes that, BorgMatic or the started Borg client? As
remote-path
is available in the formerly mentioned command lines, I guess the Borg client startsborg serve
on its own. My YAML config contains the following line, so I'm somewhat sure that BorgMatic at least provides some additional setting to the Borg client instance.How does
borg serve
know about its end?At some point the Borg client has finished. How does
borg serve
recognized that? If BorgMatic maintains that procvess, it needs to be able to properly stop it as well. Easiest would be if the Borg client simply tells the process to shutdown on it's own.Other notes / implementation ideas
It looks to me there are two problems: The first is starting the Borg client remotely, currently BorgMatic doesn't seem to know anything of remote clients. OTOH, it has support for
local_path
, which might be abused to contain some shell script doing SSH and argument handling to actually start a remote Borg instance.The second probelm is handling
borg serve
. If that is started by the Borg client, this needs to be replaced somehow so that BorgMatic starts it and only handles coimmunication between the client and server instance.Any thoughts? Do you think something like that is useful, of interest, somewhat easy to implement? Maybe I'm wrong with some assumptions?
Thanks!
borg serve
at all?borg serve
know about its end?Environment
borgmatic version: 1.6.1
borgmatic installation method: PIP, system wide
Borg version: 1.2.1
Python version: 3.8.10
operating system and version: Ubuntu 20.04
First of all, thanks for putting so much thought and detail into this ticket!
So I'm just restating your proposal to make sure I understand:
There is a central backup server with Borg repositories and a bunch of separate machines to be backed up. (I'll try to avoid the term "client", since "client" and "server" are kind of reversed here.)
borg serve
on the central backup server and (remotely via SSH) runsborg create
on a machine to be backed up, giving a repository target of a socat socket so that the remotecreate
sub-command communicates with the central backup server'sborg serve
process, thereby writing to the repository on the central server.Let me know if my understanding is incorrect here.
I don't believe this is the case.. assuming you're talking about, for instance, the
borg create
instance. Anything that goes to stdout is logging information. My understanding is thatborg create
uses the SSH (orBORG_RSH
) command to communicate with any remote server.That's correct. borgmatic has nothing to do with that currently.
This just causes borgmatic to pass this value as the
BORG_RSH
environment variable, and Borg does the rest.Normally,
borg create
starts and stops theborg serve
process. And yeah, with the proposed setup, borgmatic would have to start and stop it.Local path is just a path to a binary; I don't believe you can stuff a whole shell script in there. The way to do it today would probably be by using the
before_actions
hook to run whatever setup shell commands you'd like to run. But then there's still the "problem" of borgmatic runningborg create
locally...Yeah, I'm not sure that's possible. At least, not supported by Borg. It's possible it can be "faked out" via a clever use of
--remote-path
.I think what you described is quite clever and might actually work if implemented, but it sounds potentially brittle and special-case to my eye. I generally try to use Borg "as designed" as much as possible (e.g. not running
borg serve
myself), although I realize it may not always support all use cases out of the box. In terms of interest, I can think of at least one ticket on the subject of "pull mode": #346. In that ticket, they ended up using a different approach:borg serve
on the central backup server and resulting in the repository getting written to the central backup server.Downsides versus your proposal: 1. This requires not just Borg but also borgmatic to be installed and configured on each server, 2. Technically, the remote
borg create
would be running and communicating withborg serve
on the central backup server, although you can optionally lock down SSH so that's all it can possibly run, and 3. Running the reverse SSH tunnels and kicking off borgmatic would happen "outside" of borgmatic, presumably in some separate script.Upsides: 1. Already works without borgmatic changes, 2. I think the security model is similar, in that the central backup server is responsible for bringing up and tearing down the SSH tunnel (analogous to socat), and 3. All scheduling/initiating is still done on the central backup server, 4. More of a "supported by Borg" approach, since
borg create
, for instance, would still be responsible for running and communicating withborg serve
.Interested in your thoughts!
Thanks for taking your time and discussing this with me! :-)
Almost, it's important that I have 3 hosts: The ones to be backed up, being VMs or whatever, the backup server itself, which in my case is a Proxmox-host running VMs mostly, and the storage for the backups itself. The latter is important because it might be a totally different system and in my case really is: I'm hosting at Hetzner, which provides something called Storage Box with their dedicated servers, which conceptually simply is a NAS. But that thing is pre-configured to really run
borg serve
when being accessed using SSH on it's own already. So I have something like this:And the cool thing about this is that for pruning, checking archives etc. the clients are NOT involved, but all of that maintenance is done with BorgMatic already. In theory I could use your hooks to setup something like SOCAT, am already using hooks and named pipes to backup database dumps using SSH, but SOCAT feels unnecessary somehow with having BorgMatic in-place already.
I wasn't sure as well, but the docs explicitly say the following about logging:
https://borgbackup.readthedocs.io/en/stable/usage/general.html#logging
borg serve
is documented to handle its communication using STDIO only as well:https://borgbackup.readthedocs.io/en/stable/usage/serve.html
Therefore I asked on the mailing list for clarification as well.
It does, but the important thing is what Borg internally does with those commands? Does it additionally write to whatever STDIN of the started process or does it simply forward it's own STDOUT to the STDIN of that process? This is the part I'm unsure with, especially after reading that logging goes to STDERR only, which makes STDOUT free for other purposes, and that STDIO of
borg serve
is fully forwarded to SSHD.SOCAT is documented to use STDIO with
borg create
as well, but from my understanding it's STDIO of the SOCAT instance itself, not necessarily the one ofborg create
. OTOH, that doesn't mean that things are STDOUT (borg create) -> STDIN (SOCAT) -> STDOUT (SOCAT) -> STDIN (borg serve) etc.So BorgMatic would need a custom config supporting such a use-case.
It's possible already using BORG_RSH, that's exactly what is done in the documented SOCAT-scenario. In that,
borg serve
is NOT created byborg create
and BORG_RSH instead tellsborg create
to use SOCAT for communication.So from my current understanding, you would only need some additional config to start
borg serve
on your own, e.g. using already availablessh_command
. With the counterpart oflocal_path
to startborg create
remotely you would take care of that and the only question left is how to tellborg create
to not startborg serve
on it's own using BORG_SSH. We only need some BORG_RSH tellingborg create
to commnicate back to your process, so that you can simpyl forward all traffic.The latter part would be easiest if
borg create
would simply need STDOUT to be redirected over SSH, because that would be the default case already if your proces would start it using SSH. Even if not, BORG_RSH could become a shell process simply forwarding STDIN of itself to STDOUT of the SSH session?! Depends on what STDOUT is used for byborg create
.You see, I still need some more answers... :-) But I'm somewhat sure that such a write-through-thing would perfectly fit into BorgMatics use-cases.
I apologize for the lengthy delay in getting back to this.
Ah, I didn't realize you were talking about
borg serve
rather than Borg's other sub-commands. Yeah, I'm not really familiar with what Borg does internally. But if the docs sayborg serve
uses stdin/stdout/stderr, I'm inclined to believe it! 😄 However I wouldn't expectborg serve
's stdin/stdout/stderr to make its way toborg create
input/output.In any case, here's my high-level take on a feature like this: I'd be fine with borgmatic offering formal support for one of the documented methods for Borg in pull mode (such as the socat method), especially if I had a working pull request in hand for it. But my instinct is that borgmatic itself being expanded to replace socat might be going a step too far. borgmatic, as a glorified shell script responsible for orchestrating Borg invocations, probably shouldn't be responsible for intermediating Borg internals. At least that's where I'm mentally drawing the line for general borgmatic responsibilities as of today. I hope that makes sense.
I'm closing this one for now due to inactivity, but I'd be happy to open it up again if you have further thoughts. Thank you!
Edit: Potentially relevant blog post on the topic: https://blog.ollien.com/posts/pull-borgmatic/