Nicer structure for auto-added files #838

Open
opened 2024-03-06 21:45:14 +00:00 by rovo89 · 4 comments

What I'd like to do and why

I'm fascinated by the power and simplicity that borg(matic) bring to backups. There are just some minor things that would make it even better. One of them is regarding the files that get added automatically. Currently it looks like this:

etc/borgmatic.d/server.yaml
root/.borgmatic/postgresql_databases/localhost/all
root/.borgmatic/bootstrap/manifest.json
root/.borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/repository
root/.borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/archives/501f5431b99030e7aa2c4eb6fd64fd77baa3b453f7422cc954e51fb327b3ac45

It's a mix of the backup configuration, database dumps and state information. All of them reflect the local file system structure, which I tried to avoid with a working_directory setting.

In borg version 2.0.0b8 and upcoming 1.4, there's a feature to use a path /strip/prefix/./keep/postfix which will result in a file keep/postfix being archived. It could be used to ask borg to backup /etc/borgmatic.d/./server.yaml and /root/./.borgmatic, which would result in this layout:

server.yaml
.borgmatic/postgresql_databases/localhost/all
.borgmatic/bootstrap/manifest.json
.borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/repository
.borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/archives/501f5431b99030e7aa2c4eb6fd64fd77baa3b453f7422cc954e51fb327b3ac45

And the good thing is that it's backwards-compatible, resulting in the original layout for older borg versions.

Ideally, I'd like to move the config to .borgmatic. That would be possible with one of the further ideas in https://github.com/borgbackup/borg/issues/4685.

What do you think?

Other notes / implementation ideas

No response

### What I'd like to do and why I'm fascinated by the power and simplicity that borg(matic) bring to backups. There are just some minor things that would make it even better. One of them is regarding the files that get added automatically. Currently it looks like this: ``` etc/borgmatic.d/server.yaml root/.borgmatic/postgresql_databases/localhost/all root/.borgmatic/bootstrap/manifest.json root/.borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/repository root/.borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/archives/501f5431b99030e7aa2c4eb6fd64fd77baa3b453f7422cc954e51fb327b3ac45 ``` It's a mix of the backup configuration, database dumps and state information. All of them reflect the local file system structure, which I tried to avoid with a `working_directory` setting. In borg version 2.0.0b8 and upcoming 1.4, there's a feature to use a path `/strip/prefix/./keep/postfix` which will result in a file `keep/postfix` being archived. It could be used to ask borg to backup `/etc/borgmatic.d/./server.yaml` and `/root/./.borgmatic`, which would result in this layout: ``` server.yaml .borgmatic/postgresql_databases/localhost/all .borgmatic/bootstrap/manifest.json .borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/repository .borgmatic/checks/4398fb73fd5553394778b601d81b8d7d0fabf9c14644f984e4423fcd73b597ab/archives/501f5431b99030e7aa2c4eb6fd64fd77baa3b453f7422cc954e51fb327b3ac45 ``` And the good thing is that it's backwards-compatible, resulting in the original layout for older borg versions. Ideally, I'd like to move the config to `.borgmatic`. That would be possible with one of the further ideas in https://github.com/borgbackup/borg/issues/4685. What do you think? ### Other notes / implementation ideas _No response_
Owner

Thanks for taking the time to file this! I think taking advantage of Borg's upcoming prefix stripping feature as you describe could work great for the archived files in .borgmatic, especially for things like database dumps that are currently tied to the current user. However I do wonder about how manually extracting such files would work in practice. For instance, if all that's stored in the archive is a path that begins with .borgmatic instead of, say, /root, how will the user know where to extract it?

Similarly, with the case of config files, part of the purpose of the feature is to be able to easily restore them back to their original locations with the borgmatic bootstrap action. But if the stored config file paths are relative instead of absolute, how will that work?

Also, I think one thing missing for me here is why you want this feature. You mention:

It's a mix of the backup configuration, database dumps and state information. All of them reflect the local file system structure, which I tried to avoid with a working_directory setting.

Why are you trying to avoid reflecting the local filesystem structure? And what's your use case / plan for extracting these files from an archive?

Thanks for taking the time to file this! I think taking advantage of Borg's upcoming prefix stripping feature as you describe could work great for the archived files in `.borgmatic`, especially for things like database dumps that are currently tied to the current user. However I do wonder about how manually extracting such files would work in practice. For instance, if all that's stored in the archive is a path that begins with `.borgmatic` instead of, say, `/root`, how will the user know where to extract it? Similarly, with the case of config files, part of the purpose of the feature is to be able to easily restore them back to their original locations with the `borgmatic bootstrap` action. But if the stored config file paths are relative instead of absolute, how will that work? Also, I think one thing missing for me here is _why_ you want this feature. You mention: > It's a mix of the backup configuration, database dumps and state information. All of them reflect the local file system structure, which I tried to avoid with a `working_directory` setting. Why are you trying to avoid reflecting the local filesystem structure? And what's your use case / plan for extracting these files from an archive?
Author

Let me start with the why and some context:

I started using borg(matic) for a small home server. My intentions will probably differ from those of companies responsible for a huge farm of automated, "cattle"-like servers.

In this case, I want to back up the data of an Immich instance (Google Photos replacement). So just one "app", not the whole server, that might also be an important differentiator. Anyway, the data is currently stored on /media/nas/immich, but that's not set in stone. I might be moving the data to some other disk tomorrow if I decide to optimize my server structure, or I might want to restore it to another server with a different base directory. So the absolute path is an implementation detail which I don't want to back up, my focus on ensuring that the actual data is safe. If I do move the folder, borg diff shouldn't show a huge list of added/removed files.

That's why I use working_directory, which is sufficient because Immich's data is all under a single folder. Now the filesystem data is stored as relative paths, but the "generated" files still use absolute paths. This mix is not so nice, and besides that, having a single .borgmatic/ entry point would make it totally clear where the data came from, rather than seeing etc/ and root/.

You could also see it as a separation of concerns - I'd like borgmatic to put all its stuff into a single, root-level folder.


I can't answer the questions about borgmatic bootstrap as I haven't used that feature yet. In my case, I probably wouldn't rely on automatic restores. I would be happy that I can get my destroyed data back, and if it takes three commands to extract the folders to their respective locations, that's fine as well

For .borgmatic, I guess you could handle it as a "magic" location and always restore it to the home directory if needed. It would also be in a fixed location (well, if using the slash-dot), so you don't have to search for it. Currently, it could be in /root/.borgmatic, but also in /home/someuser/.borgmatic, right?

That would leave only the config questionable. Copying/linking it to /root/.borgmatic/config.yaml (or maybe an extension of the slash-dot feature) would also bring that to a fixed location with similar benefits. If there's a requirement to restore the config to the original path, could that information be taken from bootstrap/manifest.json?

Or another idea: Back up the config to .borgmatic/etc/borgmatic.d/server.yaml, then the original paths can be found by just stripping the prefix. Again, these is just some food for thoughts, there might be better implementations.


NB: Does borgmatic ensure that only a single process is running? Since the /root/.borgmatic directory seems to be shared across different repositories and configs, it has to be ensured that e.g. a database being dumped for process 1 isn't written to the backup if process 2. I assume subfolders (per repository? per config?) would make this safer, but I have no idea what other checks are already in place.

Let me start with the _why_ and some context: I started using borg(matic) for a small home server. My intentions will probably differ from those of companies responsible for a huge farm of automated, "cattle"-like servers. In this case, I want to back up the data of an Immich instance (Google Photos replacement). So just one "app", not the whole server, that might also be an important differentiator. Anyway, the data is currently stored on `/media/nas/immich`, but that's not set in stone. I might be moving the data to some other disk tomorrow if I decide to optimize my server structure, or I might want to restore it to another server with a different base directory. So the absolute path is an implementation detail which I don't want to back up, my focus on ensuring that the actual data is safe. If I do move the folder, `borg diff` shouldn't show a huge list of added/removed files. That's why I use `working_directory`, which is sufficient because Immich's data is all under a single folder. Now the filesystem data is stored as relative paths, but the "generated" files still use absolute paths. This mix is not so nice, and besides that, having a single `.borgmatic/` entry point would make it totally clear where the data came from, rather than seeing `etc/` and `root/`. You could also see it as a separation of concerns - I'd like borgmatic to put all its stuff into a single, root-level folder. --- I can't answer the questions about `borgmatic bootstrap` as I haven't used that feature yet. In my case, I probably wouldn't rely on automatic restores. I would be happy that I can get my destroyed data back, and if it takes three commands to extract the folders to their respective locations, that's fine as well For `.borgmatic`, I guess you could handle it as a "magic" location and always restore it to the home directory if needed. It would also be in a fixed location (well, if using the slash-dot), so you don't have to search for it. Currently, it could be in `/root/.borgmatic`, but also in `/home/someuser/.borgmatic`, right? That would leave only the config questionable. Copying/linking it to `/root/.borgmatic/config.yaml` (or maybe an extension of the slash-dot feature) would also bring that to a fixed location with similar benefits. If there's a requirement to restore the config to the original path, could that information be taken from `bootstrap/manifest.json`? Or another idea: Back up the config to `.borgmatic/etc/borgmatic.d/server.yaml`, then the original paths can be found by just stripping the prefix. Again, these is just some food for thoughts, there might be better implementations. --- NB: Does borgmatic ensure that only a single process is running? Since the `/root/.borgmatic` directory seems to be shared across different repositories and configs, it has to be ensured that e.g. a database being dumped for process 1 isn't written to the backup if process 2. I assume subfolders (per repository? per config?) would make this safer, but I have no idea what other checks are already in place.
Owner

Thanks for getting into your rationale here, and what you describe makes sense to me for ~/.borgmatic. One way it could work is that when looking for database dumps to restore, borgmatic could first probe for .borgmaticwithin the archive before falling back to ~/.borgmatic (based on the current user) for backwards compatibility. And of course storing .borgmatic instead of ~/.borgmatic would have to wait for for Borg 1.4. But it seems like a reasonable approach to me.

I'm still not convinced though on the suggestions for storing config file paths. It seems unnecessary to encode the original path of a configuration file within bootstrap/manifest.json or a .borgmatic sub-path when there's already a perfectly good mechanism for encoding the original path of an archived file—the full file path as its stored within the Borg archive! Using the full file path for that also has the relatively minor benefit that each config file is stored only once even if a user backs up their config files by adding them to source_directories.

If you really don't want the auto-storing of config files necessary for borgmatic bootstrap though, you can always set store_config_files: false in your configuration.

NB: Does borgmatic ensure that only a single process is running? Since the /root/.borgmatic directory seems to be shared across different repositories and configs, it has to be ensured that e.g. a database being dumped for process 1 isn't written to the backup if process 2. I assume subfolders (per repository? per config?) would make this safer, but I have no idea what other checks are already in place.

No, borgmatic itself doesn't currently ensure that it isn't run more than once, but typically that responsibility would fall to whatever is running borgmatic (e.g. systemd, which does do that IIRC). And you correct that simultaneous borgmatic runs with database dumps would encounter problems.

Thanks for getting into your rationale here, and what you describe makes sense to me for `~/.borgmatic`. One way it could work is that when looking for database dumps to restore, borgmatic could first probe for `.borgmatic`within the archive before falling back to `~/.borgmatic` (based on the current user) for backwards compatibility. And of course storing `.borgmatic` instead of `~/.borgmatic` would have to wait for for Borg 1.4. But it seems like a reasonable approach to me. I'm still not convinced though on the suggestions for storing config file paths. It seems unnecessary to encode the original path of a configuration file within `bootstrap/manifest.json` or a `.borgmatic` sub-path when there's already a perfectly good mechanism for encoding the original path of an archived file—the full file path as its stored within the Borg archive! Using the full file path for that also has the relatively minor benefit that each config file is stored only once even if a user backs up their config files by adding them to `source_directories`. If you really don't want the auto-storing of config files necessary for `borgmatic bootstrap` though, you can always set `store_config_files: false` in your configuration. > NB: Does borgmatic ensure that only a single process is running? Since the /root/.borgmatic directory seems to be shared across different repositories and configs, it has to be ensured that e.g. a database being dumped for process 1 isn't written to the backup if process 2. I assume subfolders (per repository? per config?) would make this safer, but I have no idea what other checks are already in place. No, borgmatic itself doesn't currently ensure that it isn't run more than once, but typically that responsibility would fall to whatever is running borgmatic (e.g. systemd, which _does_ do that IIRC). And you correct that simultaneous borgmatic runs with database dumps would encounter problems.
Owner

Related ticket, maybe good to tackle at the same time: #562

Related ticket, maybe good to tackle at the same time: #562
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#838
No description provided.