database snapshots #232
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
A corollary to #231 is database-level snapshots. It would be great to be able to do binary backups of database instead of the current "SQL dump" approach. The downside of "SQL dumps" is they can take a very long time to load for large backups, and take a lot of extra space. By using filesystem snapshots or just locking the database for the backup's duration, you get the binary files directly and restoring from backups is just like restoring the database from a crash...
Other notes / implementation ideas
Many database support issuing a command to flush everything to disk and prepare for catastrophe. You have
FLUSH TABLES WITH READ LOCK
in MySQL/MariaDB, which needs to be followed by aUNLOCK TABLES
when finished with the backup. There's no equivalent that I could find on the other side, but there isCHECKPOINT
in PostgreSQL which flushes all WAL files and allow for a shorter recovery. PostgreSQL would require filesystem snapshots for the backups to be consistent, however.I know that both of those could be implemented with simple shell scripts, but I'm wondering if you'd be interested in merging this directly in the Python code, which might be easier (although I haven't looked).
Worst case, this issue can track the shell script implementation too. :)
Thanks again!
I'd be interested in merging this directly in the Python code, although be aware that borgmatic is in many ways itself a glorified shell script.. written in Python! By that, I mean that its Borg integration is "just" shelling out to Borg. Its existing PostgreSQL integration is "just" shelling out to
pg_dump
. So I could certainly see adding more shelling out to implement what you're talking about here.. Unless you're thinking of a more direct library-level integration.In terms of this ask, can you run through the series of events you foresee happening once borgmatic supports this and is properly configured? Is it something like this?
Let me know if I'm missing something.
There are probably database libs that could help us do things here, but I'm not sure it would be worth the extra dependencies.
I think that's pretty much it. As the
Snapshot
class, I wouldsuggest drafting an abstract interfce for this, because, for example,
MySQL can work without filesystem level snapshots, because
FLUSH WITH READ LOCK
provides a consistent, unchanging view of the database aslong as the lock is held. It does means the database is basically
readonly the entire time, so we usually use snapshots there as well
in production environments, but it's not absolutely mandatory.
You will need to hold that database lock, so it means borgmatic
will need to keep a mysql client connection alive for the duration of
the backup. That might be annoying to do with the commandline client,
as you'd need to open some sort of "dialog" with a mysql pipe, which
could easily mess up. Using a client library might make sense in this
case, if we really don't want to depend on filesystem-level snapshots
for this database type.
PostgreSQL, on the contrary, assumes the backup tool has atomic
snapshot of the filesystem, and there's no way to "lock" the database
in the same way, which is probably a good design decision anyways, now
that I think of it. So you need a frozen filesystem for this, or
you need to shutdown the database, run the backups, and start it
again, which is probably unacceptable for most production scenarios.
So yeah:
sort of state copy, keeping the lock in some cases (e.g. MySQL)
database in some cases (e.g. PostgreSQL without FS snapshots)
Also note that PostgreSQL has this concept of "clusters" which is a
bit alien to the MySQL world. You can basically have multiple
PostgreSQL "servers" on the same machine, each with their own
database, user, process, locking and so on namespace. Those can (and
should) be independently backed up, and that can be done in parallel.
So maybe the abstract class would look something like this:
(Kind of embarrassing, that "on the fly class design"... There's
probably a better way to do this, but it's late and I just want to
brain dump that stuff. ;)
I hope that helps!
Thank you for the detailed answer. The series of steps you laid out makes sense to me, although I don't know that supporting offline backups (taking the database down) would be useful to that many users! Online backups via snapshots, certainly.
On the topic of PostgreSQL binary backups, have you seen
pg_basebackup
? (Also, a less reference-docs-y overview.) That appears to use internal Postgres checkpointing ("backup mode") to get a consistent snapshot of the data directory. I can only imagine that that's preferable to using external filesystem checkpointing to accomplish the same thing, especially since it can be executed remotely. But I don't know what folks typically do in the wild. And what the differences are in I/O impact for each method.Also, I wonder if there's a comparable approach in MariaDB land, or whether most folks use the filesystem snapshot approach.
True, it might be worth ignoring that use case completely.
We actually use this as part of the custom backup system here at tor, so
it's for real, and could be used here. I'm just not very familiar with
it so I didn't think of bringing it up.
Main problem with that, IMHO, is that it creates local files, so
duplicates storage space requirements. Maybe there's a way that it could
be fed directly into borg as an object instead?
mariabackup looks like something that was called mysql_hotcopy, if my
memory is correct, and yes, it serves a similar purpose.
Yup, there is! But that'd be an optimization. (Maybe a necessary one for certain use cases..)
Hi.
I now this is quite old request, but it is still valid if you want backup bigger databases ( MySQL/MariaDB/PostgreSQL ) with reasonable recovery times. I'm using xtrabackup/maribackup as well as pg_basebackup for databases of different sizes without wasting precious space on ssd virtual machines disks, just piping output through ssh tunnel to backup location, and of course saving compressed backup there.
This is how I'm doing it
Of course, you can use incremental backup (use xtralsn) for xtrabackup/maribackup and it is extreamly useful when doing recovery. It takes about 1-2 hours max for Mysql/MariDB db size around 100GB.
I would like to use borgmatic in the same fasion, pipe xtrabackup/mariabackup/pg_basebackups directly to the borg repositories. Is it possible to implement this with borgmatic? I have googled around for clean borg solution but I could't find anything. Anyway it makes sense to me to have generic borgmatic hook to implement similar backup approaches. Thanks anyone for answer in advance.
There's not a way to do this today with borgmatic, as it's pretty geared around
pg_dump
for PostgreSQL andmysqldump
for MySQL/MariaDB. But there's no reason why it couldn't be extended to supportxtrabackup
/mariabackup
/pg_basebackups
—including streaming their output to Borg the same way borgmatic does with dumps today. That's assuming the output can be streamed to a single local named pipe path (that borgmatic configures Borg to consume).There would have to be some adjustments to the configuration, however. For instance, the existing configuration is database-oriented, while this would have to be more cluster-oriented. Maybe that means in addition to
postgresql_databases
, for instance, there should be a separatepostgresql_clusters
hook with different options? Similar for MySQL/MariaDB.