Please add option to dump databases without streaming
Closedopened 6 months ago by Daniel · 14 comments
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
What I'm trying to do and why
Streaming of database backups was added to resolve #258, however, having to use
--read-specialis a major disadvantage. Today, borg got stuck in what was essentially an infinite loop consuming 100% of one CPU core and a large amount of bandwidth (both the client system and the backup server have 10Gbps networking).
I back up
/var/lib/docker/volumes(storage data for Docker containers), and one of the new Docker containers I created had a device file in its volume. This was on a VPS, which generally have shared resources and strict CPU usage restrictions, for example you can't 100% of one core for a long period of time.
Because of this, I'd like the ability to disable streaming so I can avoid using
--read-special. My databases are less than 1GB total, so streaming doesn't provide a major advantage.
To work around this issue, I'm currently using a
before_backuphook to manually dump the MySQL databases:
borgmatic version: 1.6.3
borgmatic installation method: Debian package
Borg version: 1.1.16
Python version: 3.9.2
Thank you for taking the time to file this—and persisting throuh user registeration! This is tough issue, and has a been recurring one. It's arguably a terrible user experience for borgmatic to cause Borg to unceremoneously hang on certain files whenever database hooks are enabled. We could very well work around the issue as you describe with a new option to disable database streaming, but my concern is that it doesn't solve the underlying user experience issue unless a user knows to enable the option. And even then, it's kind of annoying to have to enable a (non-default) option to avoid a hang on certain systems. We could make non-streaming the default, but then we give up performance (by default).
Here's some brainstorming on different compromise solutions:
Let me know if you have any thoughts or other suggestions. I know this is more than you asked for when filing the ticket, so feel free to say "I don't know.. you figure it out!" if you like.
This is fine, but backups that suddenly stop working due to changes on the file system aren't ideal.
I think I'd be happy with this solution. It seems like a reasonable compromise. I wonder how much extra IO this would add to the backup process though. I was going to say that I wonder if Borgbackup could add this option, however I'm sure its developers would just say to stop using the
This is also not ideal for the reason that the first solution isn't ideal - a change on the file system has a big impact on an unrelated part of the backup (like you mentioned). SQL takes up more space than the native storage mechanism used by the DB, so for larger DBs (say over ~1TB in size), storing the dump on disk might not even be possible.
This is doable. I wouldn't be happy with this as I'd need to remember about two different backup repos, but maybe it wouldn't be too bad.
This is not ideal because borg would be using 100% of one core until the timeout is hit.
Another issue is this: What if I want to back up the databases, and back up the metadata for the device/pipe descriptor itself? Borg's default mode (without
--read-special) backs up the metadata required by
mkfifo, etc, which is useful in some cases, as it'll re-create the node on restore. Out of all the options in your list, I think only the "Disable the ability to store database dumps and plain files in the same Borg archive" one would solve this use case, and I don't feel like that's a viable solution.
Some other potential options:
read_specialoption in Borgmatic's config, to also enable database streaming only when it's enabled. This is not really ideal as it's adding another implicit behaviour to the config.
streamconfig option for database streaming, and throw an error if it's used without
read_special: true. This removes the confusing implicit behaviour of
read_specialtoday, and requires the user to explicitly change both options (and thus be more aware of what's happening). I guess this is the same as what I originally suggested, but with validation.
Agreed. That's basically how things work today!
Okay, I might play around with this approach and see how bad it is. (I tried
borg create --dry-run --filter bcfin an attempt to get it to list just special files it found, but that apparently doesn't work.)
That does appear to be the case. You can achieve this today though by separating out the databases and files to backup into two separate borgmatic configuration files—which would result in a separate backup archive for each.
Yeah, that's certainly more explicit, so that's nice. But it's still got the problem of a user having to know to either enable the
streamoption (for performance) or disable it (for not hanging on special files). I could see making it a required option with no default when using database hooks at least, so the user has to make that explicit choice. But still not ideal. (It's looking increasingly like we're going to have to choose from several non-ideal options.)
There is, but it'll only do a single file / database dump at a time: Stream via Borg's stdin. The whole named pipe streaming thing with borgmatic is a way to work around that stdin limitation, so we can stuff multiple database dumps into a single Borg archive. Which suggests another option: Switch to Borg's stdin feature for streaming, drop the use of
--read-special, and only allow a single database per Borg archive. The configuration file would probably need to change to support that (since you couldn't just assume the same configured archive prefix for each database), and I'm not sure which of the archives the plain files would go in. I also have no idea how dumping "all" databases would work with this approach...
I just realized this is totally not possible without borgmatic reimplementing full parsing and logic support for Borg patterns files, which can get pretty darn complex...
Okay, no, here's an approach for borgmatic that might work:
borg create --dry-run --listalong with all source directories, patterns, etc. passed to Borg.
borg createcommand to make a backup of files and database dumps. No hangs!
(Sorry for the comment spam!)
EDIT: For my own reference, here's apparently how to test a path for specialness in Python:
Well, it's not pretty, but I have it basically working and auto-excluding special files outside of
Still needs test coverage and more manual testing though.
Okay, these changes are pushed to master now and will be part of the next release. Thanks for filing this!
It occurs to me this approach is problematic, because any users who explicitly enable
read_specialbecause they actually want Borg to, well, read special files will get their wishes thwarted... because borgmatic will then go and exclude those special files from the backup.
I'm not sure about a solution here other than backing out the change, which is a non-solution. I'll reopen the ticket.
Okay, the "fix" wasn't too bad. Here's the new logic: If database hooks are enabled, the special file exclusion kicks in. But if the user explicitly sets
read_specialto true (whether or not database hooks are enabled), then the special file exclusion is skipped. The idea is that the user setting
read_specialto true indicates they know what they're doing—specifically, requesting inclusion of special files.
Just released in borgmatic 1.7.3!
It seems borgmatic fails now on every broken symlink.
Confirmed with a local repro! As per usual, thank you for reporting this.
Fixed in master as part of #596. This fix will be part of the next release (hopefully soon).
Fix released in borgmatic 1.7.4!