Backupping a large disk #898
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'd like to do and why
I have large glusterfs disk (6TB) with lot of small file (56M files). Backupping it takes three days. These files are mostly stable. I even know which directories are modified between backup runs.
Any ideas how to speedup backup?
I'm already diving this data set to 40 yaml files and backupping them separately. One solution I came up is to backup subdirs 1-5 on Monday, 6-10 on Tuesday, and so on. This would change "daily" (3 daily) backup to weekly backup, which is fine.
This would require cron-like schedule support on yaml/borgmatic. Is there such a feature? At least I didn't find any.
Or wildcard support for "borgmatic create --repository REPOSITORY" repository name? I could name them Mon_something ,Tue_something, etc and then run
borgmatic create --repository $(date +%a)_*
I'm open to all ideas to easily backup this huge set of files.
Other notes / implementation ideas
No response
Wow, that's a lot of files! Do you know which step of the backups take so long? Running borgmatic with
--verbosity 2
should give you more information as each step is run. If the main slow step is specifically theborg create
command, then this is likely a Borg performance question rather than something in borgmatic, and you might consider filing a Borg issue or searching for an existing one. Also see these comments for some ideas, like tweaking the compression or checking disk IO.If on the other hand the slow step is
borg check
, there are a number of things you can do in borgmatic to speed that up.And if the slow step is (in full or in part) something borgmatic itself is doing, then I'd be happy to take a look at it. borgmatic itself doesn't put timestamps in its logs, but if you log to something like systemd, that will include timestamps on each log entry so you can see how long things take.
borgmatic has some various basic "scheduling" for checks, but not for making backups themselves. For that, I recommend using a real scheduler like cron or systemd. The main downside is then you wouldn't be able to just run borgmatic once and rely on it to consume forty configuration files in sequence. Instead, you'd have to individually schedule borgmatic invocations (e.g. with
--config
) to run individual configuration files or run them in batches.But before going down that road, you may want to see if the performance issues can be resolved.
Interesting idea.. I think you could achieve that today though by grouping your configuration files. E.g., put all of your Monday files into a
monday/
directory and then point borgmatic at it on Monday:borgmatic --config monday/
and so on.It's scanning the files on glusterfs disk. So it's not really borgmatic or borg related slowness. I'm already testing different glusterfs-tuning options and I've already got if faster. By faster I mean "3 days", not "5 days".
When comparing glusterfs to XFS:
opendir() / readdir() / stat() is slow
read() is fast
Thanks. I'll have to test
--noflags
next, although it probably only affects when adding more files to backup, not when scanning for changed files.I've already disabled checks as they took too long. Here's a snippet from my yaml:
Side note: Originally I had all the files in single repository but mounting it took so long that I had to split it up to multiple repostories.
I'm already using those on my other systems. For this I would need something similar (absolute
Monday
, not relativefrequency: 1 week
) to creating backups too.Yes, I got the same idea after opening this issue. There is one downside on this: For
borgmatic mount --repository something --mount-point /foo
I need to find correct--config
directory.I can do all this with systemd timers/cron and subdirs, but it would be handy to have this in borgmatic too. Two options:
--repository
would allow wildcards and I can usesomething_Mon
orMon_something
for my Monday needstags
ormatchers
. Then in yaml I could usetags: Monday, foo, bar
(multiple tags) and commandborgmatic create --tag Monday
would match only yamls withMonday
tag.I would prefer option 2.
I did XFS vs glusterfs comparison with small (400GB, 4M files) file set.
Backup from glusterfs took 7.5 hours.
Backup from XFS took 4 hours.
This is VM on Hetzner,. Backup is stored to Hetzner's storagebox.
I apologize for the delay in getting back to this. That's interesting about the relative filesystem performance. XFS sounds like a big win there. You might also consult the Borg project for filesystem performance tuning tips; that's well outside my area of expertise. And see these Borg FAQ entries on the topic.
In terms of borgmatic features to support limiting the set of configuration files "run" upon a
create
action, there is a comparable feature already for repositories: Each repository can have alabel
and you can specify that label when using the--repository
flag. But it sounds like you'd rather have a tag/label at the whole configuration file level.Could your repositories and configuration files be 1 to 1, such that there's one repository per configuration file? If so, you could maybe use the existing repository label feature for your use case.
borgmatic has that too, but only for
check
right now; see check days. But note that unlike a real scheduler, this is only "best effort" scheduling. In theory it could be generalized tocreate
as well.Yes, I'm using unique label for each repository. My missing feature for it is wildcard support:
With such wildcard support I could do my own scheduling very easily. Currently I'm doing it not-so-easily, as discussed above. I
Yes, I have one subdirectory per repository, one configuration file per repository. Problem is matching multiple repositories easily with wildcard, not splitting them to multiple configuration directories. Having them all in one configuration directory makes other borg(matic) tasks easier.
Anyway, I have a workaround for this now and this issue can be closed. If you decide to add wildcard support it would make my life easier.
Okay, the
--repository
flag now has glob support in main. It'll be part of the next release. Thanks for the suggestion!Released in borgmatic 1.8.14!