best practice on dumping databases? #551

Closed
opened 2022-06-24 08:10:13 +00:00 by dpalic · 6 comments

on dumping databases for me it is not clear who is responsible to exclude the binaries of the databases?

e.g. under debian/ubuntu

postgresql

/var/lib/postgresql/<version>/main

maria/mysql

/var/lib/mysql/

if adding directly or indirectly /var into backup folders

The both above named database data folders will be also dumped even if the config.yaml is configured to dump the databases too.

My expectation would be that borgmatic will skip the binary dumps.

But I am not clear about the best practices with borgmatic?

Could you suggest?
If dumping mysqldump shall I exclude /var/lib/mysql/ from backup?

If dumping pg_dumpall shall I exclude /var/lib/postgresql/<version>/main from backup?

if in both cases yes, why does borgmatic not do it for me?

A dream for a admin would be:

If backuping data folders of databases, borgmatic should stop the databases and restart them directly before and after copy of the files
or if the dump is configured the files shall be skipped

on dumping databases for me it is not clear who is responsible to exclude the binaries of the databases? e.g. under debian/ubuntu postgresql `/var/lib/postgresql/<version>/main` maria/mysql `/var/lib/mysql/` if adding directly or indirectly `/var` into backup folders The both above named database data folders will be also dumped even if the config.yaml is configured to dump the databases too. My expectation would be that borgmatic will skip the binary dumps. But I am not clear about the best practices with borgmatic? Could you suggest? If dumping mysqldump shall I exclude `/var/lib/mysql/` from backup? If dumping pg_dumpall shall I exclude `/var/lib/postgresql/<version>/main` from backup? if in both cases yes, why does borgmatic not do it for me? **A dream for a admin would be:** If backuping data folders of databases, borgmatic should stop the databases and restart them directly before and after copy of the files or if the dump is configured the files shall be skipped
Owner

While I understand the need, I think selecting backup directories on the user's behalf goes a little beyond the scope of borgmatic.. I wouldn't want borgmatic to do something automatically and get it wrong, making folks lose their files. Have you considered completely excluding /var in its entirety? Unless you have something in there like a mail server spool, there might not actually be anything within /var you need backed up.. especially since you're already dumping databases with borgmatic. Might be worth poking around in there to see what you actually need. An OS reinstall after a catastrophic drive loss would recreate many of the structures in /var (if not their contents).

While I understand the need, I think selecting backup directories on the user's behalf goes a little beyond the scope of borgmatic.. I wouldn't want borgmatic to do something automatically and get it wrong, making folks lose their files. Have you considered completely excluding `/var` in its entirety? Unless you have something in there like a mail server spool, there might not actually be anything within `/var` you need backed up.. especially since you're already dumping databases with borgmatic. Might be worth poking around in there to see what you actually need. An OS reinstall after a catastrophic drive loss would recreate many of the structures in `/var` (if not their contents).
witten added the
question / support
label 2022-06-24 16:52:44 +00:00
Author

to exclude /var is IMHO even worse, if you read the docs of the distributions, you have to include it into the backups
so for me not an option and also not scope of my issue

My issue is, that I have databases with hundreds of GB or TB of size. So dumping is already taking ages.

Also on restore the issue may arrise since the restore does the double and senseless work.

You may say: yes, but you can exclude manually, this is also right. But if you consider having 10s or 100s of servers, you have to do for each server individually.

I wouldn't want borgmatic to do something automatically and get it wrong, making folks lose their files. Have you considered completely excluding /var in its entirety?

I fully confirm to this, but automatically is fine, if you

  • get informed about it
  • have options to configure it

if this is missing I will underline and confirm your sentence right the way

Getting back to my issue:

So right now I have to manually check all installations with borgmatic to exclude the data folders of the databases.

add exclusion automatically

If I would build it myself in bash I would do (works with ubuntu 20.04):

cd /etc
MYSQL_DATADIR=$(grep -r "datadir\s*=" | grep -E "\.conf|\.cnf" | awk -F "="  '{print $2}'| awk -P '{print $1}')
if [ ! -z "$MYSQL_DATADIR" ]; then
   echo "you are doing a backup of your mysql/maria database,"
   echo "do you want to add $MYSQL_DATADIR to exclude folders?"
   echo "The exclusion ensures that your database is only dumped once and this saves space in the backup storage"
   # ask user to confirm the exclusing
else
   echo "could not find your mysql data dir to exclude"
   echo "you should exclude your data folders from backups"
fi

Alternative to stop and start the databases.

if I would have an option to add a hook BEFORE and AFTER dumping a database I would add here to stop and to start the database accordingly. So I could use the binaries for backups, which would make the process also okay to me.

BEFORE_DB_HOOK: 
service mariadb stop

AFTER_MYSQL_DUMP:
service mariadb start

this solution would also be okay for my case, since I would have in this case a consistent database backup (even if binary only)

More to consider if using database based dump tools

assuming 2 cases

case1: backup of specific databases

in this case we could discuss if it makes sense to exclude.
IMHO I would exclude all data folders from backup, since the user skips the other databases already

case2: backup of all databases

in this case it is fully clear that the user wants to ensure all databases are backuped. So here we can blindly exclude the data folders of the database as shown in the bash script above.

Hope I could explain it better.

This requirement comes from bigger installation, on which I am trying borgmatic right now. If you have a handful of servers, you will not face the need for the requirements, since the workload for setup is pretty low.

to exclude /var is IMHO even worse, if you read the docs of the distributions, you have to include it into the backups so for me not an option and also not scope of my issue My issue is, that I have databases with hundreds of GB or TB of size. So dumping is already taking ages. Also on restore the issue may arrise since the restore does the double and senseless work. You may say: yes, but you can exclude manually, this is also right. But if you consider having 10s or 100s of servers, you have to do for each server individually. > I wouldn't want borgmatic to do something automatically and get it wrong, making folks lose their files. Have you considered completely excluding /var in its entirety? I fully confirm to this, but automatically is fine, if you * get informed about it * have options to configure it if this is missing I will underline and confirm your sentence right the way **Getting back to my issue:** So right now I have to manually check all installations with borgmatic to exclude the data folders of the databases. ### add exclusion automatically If I would build it myself in bash I would do (works with ubuntu 20.04): ``` cd /etc MYSQL_DATADIR=$(grep -r "datadir\s*=" | grep -E "\.conf|\.cnf" | awk -F "=" '{print $2}'| awk -P '{print $1}') if [ ! -z "$MYSQL_DATADIR" ]; then echo "you are doing a backup of your mysql/maria database," echo "do you want to add $MYSQL_DATADIR to exclude folders?" echo "The exclusion ensures that your database is only dumped once and this saves space in the backup storage" # ask user to confirm the exclusing else echo "could not find your mysql data dir to exclude" echo "you should exclude your data folders from backups" fi ``` ### Alternative to stop and start the databases. if I would have an option to add a hook BEFORE and AFTER dumping a database I would add here to stop and to start the database accordingly. So I could use the binaries for backups, which would make the process also okay to me. ``` BEFORE_DB_HOOK: service mariadb stop AFTER_MYSQL_DUMP: service mariadb start ``` this solution would also be okay for my case, since I would have in this case a consistent database backup (even if binary only) ### More to consider if using database based dump tools assuming 2 cases **case1: backup of specific databases** in this case we could discuss if it makes sense to exclude. IMHO I would exclude all data folders from backup, since the user skips the other databases already **case2: backup of all databases** in this case it is fully clear that the user wants to ensure all databases are backuped. So here we can blindly exclude the data folders of the database as shown in the bash script above. Hope I could explain it better. This requirement comes from bigger installation, on which I am trying borgmatic right now. If you have a handful of servers, you will not face the need for the requirements, since the workload for setup is pretty low.
Owner

Thanks for the detailed explanation. Would you be able to convert your script's grep pattern into Borg's pattern syntax and specify it as an exclude in borgmatic's configuration? That might be slightly easier than using a bash script.

As for the before/after database hooks: There are existing before/after backup hooks that could serve the same purpose. Since the database needs to be down for the duration of the backup, you could take the database offline in before_backup and bring it back online in after_backup.

Thanks for the detailed explanation. Would you be able to convert your script's grep pattern into [Borg's pattern syntax](https://borgbackup.readthedocs.io/en/stable/usage/help.html#borg-help-patterns) and specify it as an exclude in borgmatic's configuration? That might be slightly easier than using a bash script. As for the before/after database hooks: There are existing before/after backup hooks that could serve the same purpose. Since the database needs to be down for the duration of the backup, you could take the database offline in `before_backup` and bring it back online in `after_backup`.
Author

I am not sure if got your query properly?

Thanks for the detailed explanation. Would you be able to convert your script's grep pattern into Borg's pattern syntax and specify it as an exclude in borgmatic's configuration? That might be slightly easier than using a bash script.

Do you ask me here to add a borg extra parameter to my config
or do you ask me for a
contribution to implement the requirement in borgmatic?
can you clarify this point?

As for the before/after database hooks: There are existing before/after backup hooks that could serve the same purpose. Since the database needs to be down for the duration of the backup, you could take the database offline in before_backup and bring it back online in after_backup.

if I got your recommendation here, you ask me to use the binary backup path as recommendation? What are you experiences with migrating binary backups to new servers on restore? Especially postgresql and mariadb?

I am not sure if got your query properly? > Thanks for the detailed explanation. Would you be able to convert your script's grep pattern into Borg's pattern syntax and specify it as an exclude in borgmatic's configuration? That might be slightly easier than using a bash script. Do you ask me here to add a borg extra parameter to my config or do you ask me for a contribution to implement the requirement in borgmatic? can you clarify this point? > As for the before/after database hooks: There are existing before/after backup hooks that could serve the same purpose. Since the database needs to be down for the duration of the backup, you could take the database offline in before_backup and bring it back online in after_backup. if I got your recommendation here, you ask me to use the binary backup path as recommendation? What are you experiences with migrating binary backups to new servers on restore? Especially postgresql and mariadb?
Owner

Do you ask me here to add a borg extra parameter to my config or do you ask me for a contribution to implement the requirement in borgmatic? can you clarify this point?

The former. I'm wondering if a "simple" exclude pattern within your borgmatic configuration would solve your use case. And then you wouldn't need a separate bash script to produce exclusions (or require the equivalent within borgmatic).

if I got your recommendation here, you ask me to use the binary backup path as recommendation? What are you experiences with migrating binary backups to new servers on restore? Especially postgresql and mariadb?

Nope, not going so far as making a recommendation. I'm just giving you an option since you brought up hooks. Personally, I would probably lean towards the standard dump and restore approach rather than binary backups, just because I'm not a big fan of database downtime and it seems like a more "official" way to do backups. But you know your systems and requirements better than I do!

> Do you ask me here to add a borg extra parameter to my config or do you ask me for a contribution to implement the requirement in borgmatic? can you clarify this point? The former. I'm wondering if a "simple" exclude pattern within your borgmatic configuration would solve your use case. And then you wouldn't need a separate bash script to produce exclusions (or require the equivalent within borgmatic). > if I got your recommendation here, you ask me to use the binary backup path as recommendation? What are you experiences with migrating binary backups to new servers on restore? Especially postgresql and mariadb? Nope, not going so far as making a recommendation. I'm just giving you an option since you brought up hooks. Personally, I would probably lean towards the standard dump and restore approach rather than binary backups, just because I'm not a big fan of database downtime and it seems like a more "official" way to do backups. But you know your systems and requirements better than I do!
Owner

Closing this for now, but feel free to follow up. We can always reopen it. Thanks!

Closing this for now, but feel free to follow up. We can always reopen it. Thanks!
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#551
No description provided.