database snapshots #232

Open
opened 2019-10-23 23:44:46 +00:00 by anarcat · 7 comments

What I'm trying to do and why

A corollary to #231 is database-level snapshots. It would be great to be able to do binary backups of database instead of the current "SQL dump" approach. The downside of "SQL dumps" is they can take a very long time to load for large backups, and take a lot of extra space. By using filesystem snapshots or just locking the database for the backup's duration, you get the binary files directly and restoring from backups is just like restoring the database from a crash...

Other notes / implementation ideas

Many database support issuing a command to flush everything to disk and prepare for catastrophe. You have FLUSH TABLES WITH READ LOCK in MySQL/MariaDB, which needs to be followed by a UNLOCK TABLES when finished with the backup. There's no equivalent that I could find on the other side, but there is CHECKPOINT in PostgreSQL which flushes all WAL files and allow for a shorter recovery. PostgreSQL would require filesystem snapshots for the backups to be consistent, however.

I know that both of those could be implemented with simple shell scripts, but I'm wondering if you'd be interested in merging this directly in the Python code, which might be easier (although I haven't looked).

Worst case, this issue can track the shell script implementation too. :)

Thanks again!

#### What I'm trying to do and why A corollary to #231 is *database*-level snapshots. It would be great to be able to do *binary* backups of database instead of the current "SQL dump" approach. The downside of "SQL dumps" is they can take a very long time to load for large backups, and take a lot of extra space. By using filesystem snapshots or just locking the database for the backup's duration, you get the binary files directly and restoring from backups is just like restoring the database from a crash... #### Other notes / implementation ideas Many database support issuing a command to flush everything to disk and prepare for catastrophe. You have [`FLUSH TABLES WITH READ LOCK` in MySQL/MariaDB](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock), which needs to be followed by a `UNLOCK TABLES` when finished with the backup. There's no equivalent that I could find on the other side, but there *is* [`CHECKPOINT` in PostgreSQL](https://www.postgresql.org/docs/9.1/backup-file.html) which flushes all WAL files and allow for a shorter recovery. PostgreSQL would require filesystem snapshots for the backups to be consistent, however. I know that both of those could be implemented with simple shell scripts, but I'm wondering if you'd be interested in merging this directly in the Python code, which might be easier (although I haven't looked). Worst case, this issue can track the shell script implementation too. :) Thanks again!
Owner

I'd be interested in merging this directly in the Python code, although be aware that borgmatic is in many ways itself a glorified shell script.. written in Python! By that, I mean that its Borg integration is "just" shelling out to Borg. Its existing PostgreSQL integration is "just" shelling out to pg_dump. So I could certainly see adding more shelling out to implement what you're talking about here.. Unless you're thinking of a more direct library-level integration.

In terms of this ask, can you run through the series of events you foresee happening once borgmatic supports this and is properly configured? Is it something like this?

  1. Run flush and/or checkpoint on the database.
  2. Create filesystem snapshot of internal database storage.
  3. Run backups with Borg.
  4. Remove filesystem snapshot.

Let me know if I'm missing something.

I'd be interested in merging this directly in the Python code, although be aware that borgmatic is in many ways itself a glorified shell script.. written in Python! By that, I mean that its Borg integration is "just" shelling out to Borg. Its existing PostgreSQL integration is "just" shelling out to `pg_dump`. So I could certainly see adding more shelling out to implement what you're talking about here.. Unless you're thinking of a more direct library-level integration. In terms of this ask, can you run through the series of events you foresee happening once borgmatic supports this and is properly configured? Is it something like this? 1. Run flush and/or checkpoint on the database. 2. Create filesystem snapshot of internal database storage. 3. Run backups with Borg. 4. Remove filesystem snapshot. Let me know if I'm missing something.
Author

So I could certainly see adding more shelling out to implement what you’re talking about here.. Unless you’re thinking of a more direct library-level integration.

There are probably database libs that could help us do things here, but I'm not sure it would be worth the extra dependencies.

In terms of this ask, can you run through the series of events you foresee happening once borgmatic supports this and is properly configured? Is it something like this?

  1. Run flush and/or checkpoint on the database.
  2. Create filesystem snapshot of internal database storage.
  3. Run backups with Borg.
  4. Remove filesystem snapshot.

Let me know if I’m missing something.

I think that's pretty much it. As the Snapshot class, I would
suggest drafting an abstract interfce for this, because, for example,
MySQL can work without filesystem level snapshots, because FLUSH WITH READ LOCK provides a consistent, unchanging view of the database as
long as the lock is held. It does means the database is basically
readonly the entire time, so we usually use snapshots there as well
in production environments, but it's not absolutely mandatory.

You will need to hold that database lock, so it means borgmatic
will need to keep a mysql client connection alive for the duration of
the backup. That might be annoying to do with the commandline client,
as you'd need to open some sort of "dialog" with a mysql pipe, which
could easily mess up. Using a client library might make sense in this
case, if we really don't want to depend on filesystem-level snapshots
for this database type.

PostgreSQL, on the contrary, assumes the backup tool has atomic
snapshot of the filesystem, and there's no way to "lock" the database
in the same way, which is probably a good design decision anyways, now
that I think of it. So you need a frozen filesystem for this, or
you need to shutdown the database, run the backups, and start it
again, which is probably unacceptable for most production scenarios.

So yeah:

  1. flush/checkpoint the database, just to make sure disk has some
    sort of state copy, keeping the lock in some cases (e.g. MySQL)
  2. create filesystem snapshot, if available
  3. release lock, if snapshot is available, or if not, shutdown
    database in some cases (e.g. PostgreSQL without FS snapshots)
  4. run backups with Borg
  5. release lock or start database if snapshot is unavailable
  6. remove filesystem snapshot, if available

Also note that PostgreSQL has this concept of "clusters" which is a
bit alien to the MySQL world. You can basically have multiple
PostgreSQL "servers" on the same machine, each with their own
database, user, process, locking and so on namespace. Those can (and
should) be independently backed up, and that can be done in parallel.

So maybe the abstract class would look something like this:

def _notimplemented():
  raise NotImplementedError()

class DatabaseBackup:
   # flush the database data to ensure latest on disk
   flush = _notimplemented
   # lock the database so no more changes occur, would just be a noop
   # for pgqsql
   lock = _notimplemented
   # release the lock, if implemented, silent otherwise
   unlock = _notimplemented
   # snapshot the filesystem, if available - maybe mix in Snapshot here?
   snapshot = _notimplemented

(Kind of embarrassing, that "on the fly class design"... There's
probably a better way to do this, but it's late and I just want to
brain dump that stuff. ;)

I hope that helps!

> So I could certainly see adding more shelling out to implement what you’re talking about here.. Unless you’re thinking of a more direct library-level integration. There are probably database libs that could help us do things here, but I'm not sure it would be worth the extra dependencies. > In terms of this ask, can you run through the series of events you foresee happening once borgmatic supports this and is properly configured? Is it something like this? > > 1. Run flush and/or checkpoint on the database. > 2. Create filesystem snapshot of internal database storage. > 3. Run backups with Borg. > 4. Remove filesystem snapshot. > > Let me know if I’m missing something. I think that's pretty much it. As the `Snapshot` class, I would suggest drafting an abstract interfce for this, because, for example, MySQL can work without filesystem level snapshots, because `FLUSH WITH READ LOCK` provides a consistent, unchanging view of the database as long as the lock is held. It does means the database is basically readonly the entire time, so we *usually* use snapshots there as well in production environments, but it's not absolutely mandatory. You *will* need to *hold* that database lock, so it means borgmatic will need to keep a mysql client connection alive for the duration of the backup. That might be annoying to do with the commandline client, as you'd need to open some sort of "dialog" with a mysql pipe, which could easily mess up. Using a client library might make sense in this case, if we really don't want to depend on filesystem-level snapshots for this database type. PostgreSQL, on the contrary, assumes the backup tool has atomic snapshot of the filesystem, and there's no way to "lock" the database in the same way, which is probably a good design decision anyways, now that I think of it. So you *need* a frozen filesystem for this, *or* you need to *shutdown* the database, run the backups, and start it again, which is probably unacceptable for most production scenarios. So yeah: 1. flush/checkpoint the database, just to make sure disk has some sort of state copy, keeping the lock in some cases (e.g. MySQL) 2. create filesystem snapshot, if available 3. release lock, if snapshot is available, or if not, shutdown database in some cases (e.g. PostgreSQL without FS snapshots) 4. run backups with Borg 5. release lock or start database if snapshot is unavailable 6. remove filesystem snapshot, if available Also note that PostgreSQL has this concept of "clusters" which is a bit alien to the MySQL world. You can basically have multiple PostgreSQL "servers" on the same machine, each with their own database, user, process, locking and so on namespace. Those can (and should) be independently backed up, and that can be done in parallel. So maybe the abstract class would look something like this: ``` def _notimplemented(): raise NotImplementedError() class DatabaseBackup: # flush the database data to ensure latest on disk flush = _notimplemented # lock the database so no more changes occur, would just be a noop # for pgqsql lock = _notimplemented # release the lock, if implemented, silent otherwise unlock = _notimplemented # snapshot the filesystem, if available - maybe mix in Snapshot here? snapshot = _notimplemented ``` (Kind of embarrassing, that "on the fly class design"... There's probably a better way to do this, but it's late and I just want to brain dump that stuff. ;) I hope that helps!
Owner

Thank you for the detailed answer. The series of steps you laid out makes sense to me, although I don't know that supporting offline backups (taking the database down) would be useful to that many users! Online backups via snapshots, certainly.

On the topic of PostgreSQL binary backups, have you seen pg_basebackup? (Also, a less reference-docs-y overview.) That appears to use internal Postgres checkpointing ("backup mode") to get a consistent snapshot of the data directory. I can only imagine that that's preferable to using external filesystem checkpointing to accomplish the same thing, especially since it can be executed remotely. But I don't know what folks typically do in the wild. And what the differences are in I/O impact for each method.

Also, I wonder if there's a comparable approach in MariaDB land, or whether most folks use the filesystem snapshot approach.

Thank you for the detailed answer. The series of steps you laid out makes sense to me, although I don't know that supporting *offline* backups (taking the database down) would be useful to that many users! Online backups via snapshots, certainly. On the topic of PostgreSQL binary backups, have you seen [`pg_basebackup`](https://www.postgresql.org/docs/12/app-pgbasebackup.html)? (Also, a less reference-docs-y [overview](https://www.opsdash.com/blog/postgresql-backup-restore.html#backup-using-pg_basebackup).) That appears to use internal Postgres checkpointing ("[backup mode](https://serverfault.com/questions/543674/what-is-backup-mode-in-postgresql)") to get a consistent snapshot of the data directory. I can only imagine that that's preferable to using external filesystem checkpointing to accomplish the same thing, especially since it can be executed remotely. But I don't know what folks typically do in the wild. And what the differences are in I/O impact for each method. Also, I wonder if there's a [comparable approach](https://mariadb.com/kb/en/library/full-backup-and-restore-with-mariabackup/) in MariaDB land, or whether most folks use the [filesystem snapshot approach](https://mariadb.com/kb/en/library/backup-and-restore-overview/#filesystem-snapshots).
Author

Thank you for the detailed answer. The series of steps you laid out makes sense to me, although I don’t know that supporting offline backups (taking the database down) would be useful to that many users! Online backups via snapshots, certainly.

True, it might be worth ignoring that use case completely.

On the topic of PostgreSQL binary backups, have you seen pg_basebackup ( https://www.postgresql.org/docs/12/app-pgbasebackup.html ) ? (Also, a less reference-docs-y overview ( https://www.opsdash.com/blog/postgresql-backup-restore.html#backup-using-pg_basebackup ) .) That appears to use internal Postgres checkpointing (” backup mode ( https://serverfault.com/questions/543674/what-is-backup-mode-in-postgresql ) ”) to get a consistent snapshot of the data directory. I can only imagine that that’s preferable to using external filesystem checkpointing to accomplish the same thing, especially since it can be executed remotely. But I don’t know what folks typically do in the wild. And what the differences are in I/O impact for each method.

We actually use this as part of the custom backup system here at tor, so
it's for real, and could be used here. I'm just not very familiar with
it so I didn't think of bringing it up.

Main problem with that, IMHO, is that it creates local files, so
duplicates storage space requirements. Maybe there's a way that it could
be fed directly into borg as an object instead?

Also, I wonder if there’s a comparable approach ( https://mariadb.com/kb/en/library/full-backup-and-restore-with-mariabackup/ ) in MariaDB land, or whether most folks use the filesystem snapshot approach ( https://mariadb.com/kb/en/library/backup-and-restore-overview/#filesystem-snapshots ) .

mariabackup looks like something that was called mysql_hotcopy, if my
memory is correct, and yes, it serves a similar purpose.

> Thank you for the detailed answer. The series of steps you laid out makes sense to me, although I don’t know that supporting offline backups (taking the database down) would be useful to that many users! Online backups via snapshots, certainly. True, it might be worth ignoring that use case completely. > On the topic of PostgreSQL binary backups, have you seen pg_basebackup ( https://www.postgresql.org/docs/12/app-pgbasebackup.html ) ? (Also, a less reference-docs-y overview ( https://www.opsdash.com/blog/postgresql-backup-restore.html#backup-using-pg_basebackup ) .) That appears to use internal Postgres checkpointing (” backup mode ( https://serverfault.com/questions/543674/what-is-backup-mode-in-postgresql ) ”) to get a consistent snapshot of the data directory. I can only imagine that that’s preferable to using external filesystem checkpointing to accomplish the same thing, especially since it can be executed remotely. But I don’t know what folks typically do in the wild. And what the differences are in I/O impact for each method. We actually use this as part of the custom backup system here at tor, so it's for real, and *could* be used here. I'm just not very familiar with it so I didn't think of bringing it up. Main problem with that, IMHO, is that it creates local files, so duplicates storage space requirements. Maybe there's a way that it could be fed directly into borg as an object instead? > Also, I wonder if there’s a comparable approach ( https://mariadb.com/kb/en/library/full-backup-and-restore-with-mariabackup/ ) in MariaDB land, or whether most folks use the filesystem snapshot approach ( https://mariadb.com/kb/en/library/backup-and-restore-overview/#filesystem-snapshots ) . mariabackup looks like something that was called mysql_hotcopy, if my memory is correct, and yes, it serves a similar purpose.
Owner

Main problem with that, IMHO, is that it creates local files, so duplicates storage space requirements. Maybe there’s a way that it could be fed directly into borg as an object instead?

Yup, there is! But that'd be an optimization. (Maybe a necessary one for certain use cases..)

> Main problem with that, IMHO, is that it creates local files, so duplicates storage space requirements. Maybe there’s a way that it could be fed directly into borg as an object instead? Yup, [there is](https://projects.torsion.org/witten/borgmatic/issues/42)! But that'd be an optimization. (Maybe a necessary one for certain use cases..)
witten reopened this issue 2019-10-28 17:38:50 +00:00

Hi.
I now this is quite old request, but it is still valid if you want backup bigger databases ( MySQL/MariaDB/PostgreSQL ) with reasonable recovery times. I'm using xtrabackup/maribackup as well as pg_basebackup for databases of different sizes without wasting precious space on ssd virtual machines disks, just piping output through ssh tunnel to backup location, and of course saving compressed backup there.

This is how I'm doing it

Mysql xtrabackup:
/usr/bin/innobackupex --compress --compress-threads=8 --user=dbbackupuser --password=pass   --slave-info --stream=xbstream |ssh backup@backupserver "dd of=/data/backup/my-02/full_$TIMESTAMP.stream.qp

MariaDB maribackup:
mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 | pigz -p 4 backup@backupserver "dd of=/data/backup/db-03/db-03.full_$TIMESTAMP.stream.gz


Postgres pg_basebackup:
/usr/bin/pg_basebackup -X f -P -v --format=tar -z -U dbbackupuser -D -  |ssh backup@backupserver "dd of=/data/backup/pg-02/pg_backup_full-$TIMESTAMP.tar.gz

Of course, you can use incremental backup (use xtralsn) for xtrabackup/maribackup and it is extreamly useful when doing recovery. It takes about 1-2 hours max for Mysql/MariDB db size around 100GB.

I would like to use borgmatic in the same fasion, pipe xtrabackup/mariabackup/pg_basebackups directly to the borg repositories. Is it possible to implement this with borgmatic? I have googled around for clean borg solution but I could't find anything. Anyway it makes sense to me to have generic borgmatic hook to implement similar backup approaches. Thanks anyone for answer in advance.

Hi. I now this is quite old request, but it is still valid if you want backup bigger databases ( MySQL/MariaDB/PostgreSQL ) with reasonable recovery times. I'm using xtrabackup/maribackup as well as pg_basebackup for databases of different sizes without wasting precious space on ssd virtual machines disks, just piping output through ssh tunnel to backup location, and of course saving compressed backup there. This is how I'm doing it ``` Mysql xtrabackup: /usr/bin/innobackupex --compress --compress-threads=8 --user=dbbackupuser --password=pass --slave-info --stream=xbstream |ssh backup@backupserver "dd of=/data/backup/my-02/full_$TIMESTAMP.stream.qp MariaDB maribackup: mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 | pigz -p 4 backup@backupserver "dd of=/data/backup/db-03/db-03.full_$TIMESTAMP.stream.gz Postgres pg_basebackup: /usr/bin/pg_basebackup -X f -P -v --format=tar -z -U dbbackupuser -D - |ssh backup@backupserver "dd of=/data/backup/pg-02/pg_backup_full-$TIMESTAMP.tar.gz ``` Of course, you can use incremental backup (use xtralsn) for xtrabackup/maribackup and it is extreamly useful when doing recovery. It takes about 1-2 hours max for Mysql/MariDB db size around 100GB. I would like to use borgmatic in the same fasion, pipe xtrabackup/mariabackup/pg_basebackups directly to the borg repositories. Is it possible to implement this with borgmatic? I have googled around for clean borg solution but I could't find anything. Anyway it makes sense to me to have generic borgmatic hook to implement similar backup approaches. Thanks anyone for answer in advance.
Owner

There's not a way to do this today with borgmatic, as it's pretty geared around pg_dump for PostgreSQL and mysqldump for MySQL/MariaDB. But there's no reason why it couldn't be extended to support xtrabackup/mariabackup/pg_basebackups—including streaming their output to Borg the same way borgmatic does with dumps today. That's assuming the output can be streamed to a single local named pipe path (that borgmatic configures Borg to consume).

There would have to be some adjustments to the configuration, however. For instance, the existing configuration is database-oriented, while this would have to be more cluster-oriented. Maybe that means in addition to postgresql_databases, for instance, there should be a separate postgresql_clusters hook with different options? Similar for MySQL/MariaDB.

There's not a way to do this today with borgmatic, as it's pretty geared around `pg_dump` for PostgreSQL and `mysqldump` for MySQL/MariaDB. But there's no reason why it couldn't be extended to support `xtrabackup`/`mariabackup`/`pg_basebackups`—including streaming their output to Borg the same way borgmatic does with dumps today. That's assuming the output can be streamed to a single local named pipe path (that borgmatic configures Borg to consume). There would have to be some adjustments to the configuration, however. For instance, the existing configuration is database-oriented, while this would have to be more cluster-oriented. Maybe that means in addition to `postgresql_databases`, for instance, there should be a separate `postgresql_clusters` hook with different options? Similar for MySQL/MariaDB.
witten added the
new feature area
label 2023-06-28 18:53:45 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#232
No description provided.