pg_dumpall hangs (again) #468

Closed
opened 2021-11-22 16:11:01 +00:00 by neurolabs · 11 comments

What I'm trying to do and why

backup a filesystem including postgresql dumps

Steps to reproduce (if a bug)

run borgmatic with the attached config

Actual behavior (if a bug)

/root/.borgmatic/postgresql_databases/localhost/all stays empty

# strace -p 4023936
strace: Process 4023936 attached
openat(AT_FDCWD, "/root/.borgmatic/postgresql_databases/localhost/all", O_WRONLY|O_CREAT|O_TRUNC, 0666

Expected behavior (if a bug)

A dump in /root/.borgmatic/postgresql_databases/localhost/all which gets backed up.

Other notes / implementation ideas

Environment

borgmatic version: 1.5.20

Use sudo borgmatic --version or sudo pip show borgmatic | grep ^Version

borgmatic installation method: pip

Borg version: 1.1.16

Use sudo borg --version

Python version: 3.9.2

Use python3 --version

Database version (if applicable): 11.13

Use psql --version or mysql --version on client and server.

operating system and version: Debian Bullseye

#### What I'm trying to do and why backup a filesystem including postgresql dumps #### Steps to reproduce (if a bug) run borgmatic with the attached config #### Actual behavior (if a bug) /root/.borgmatic/postgresql_databases/localhost/all stays empty ``` # strace -p 4023936 strace: Process 4023936 attached openat(AT_FDCWD, "/root/.borgmatic/postgresql_databases/localhost/all", O_WRONLY|O_CREAT|O_TRUNC, 0666 ``` #### Expected behavior (if a bug) A dump in /root/.borgmatic/postgresql_databases/localhost/all which gets backed up. #### Other notes / implementation ideas #### Environment **borgmatic version:** 1.5.20 Use `sudo borgmatic --version` or `sudo pip show borgmatic | grep ^Version` **borgmatic installation method:** pip **Borg version:** 1.1.16 Use `sudo borg --version` **Python version:** 3.9.2 Use `python3 --version` **Database version (if applicable):** 11.13 Use `psql --version` or `mysql --version` on client and server. **operating system and version:** Debian Bullseye
Owner

Thanks for filing this! The /root/.borgmatic/postgresql_databases/localhost/all path is only used to create a named pipe for streaming the dump directly from Postgres to Borg without consuming additional disk space. So you shouldn't expect to see the contents of the database dump show up there. Are you sure that borgmatic / pg_dumpall are hanging, or is it perhaps just taking a while for the dump to stream to Borg? What database(s) size are we talking about here? If it is hanging, have you tried nuking /root/.borgmatic before running borgmatic?

Note: I'm not seeing the config attached here!

Thanks for filing this! The `/root/.borgmatic/postgresql_databases/localhost/all` path is only used to create a named pipe for streaming the dump directly from Postgres to Borg without consuming additional disk space. So you shouldn't expect to see the contents of the database dump show up there. Are you sure that borgmatic / pg_dumpall are hanging, or is it perhaps just taking a while for the dump to stream to Borg? What database(s) size are we talking about here? If it is hanging, have you tried nuking `/root/.borgmatic` before running borgmatic? Note: I'm not seeing the config attached here!
Author

Here's the config file uploaded again

Here's the config file uploaded again
Author

Uploading does not work, config inline (sensitive content replaced with dummy data):

location:
    source_directories:
        - .
    repositories:
        - ssh://host1/./backups/borg
        - ssh://host2/./borg
    exclude_patterns:
        - '*.pyc'
        - home/**/.cache
        - root/**/.cache
        - var/cache/*
    exclude_caches: true
storage:
    encryption_passphrase: 1234
    compression: lz4
    lock_wait: 60
    archive_name_format: '{hostname}-{now:%Y-%m-%dT%H:%M:%S}'
retention:
    keep_within: 24H
    keep_daily: 30
    keep_weekly: 4
    keep_monthly: 22
    keep_yearly: 8
    prefix: '{hostname}-'
consistency:
    checks:
        - repository
        - archives
hooks:
    postgresql_databases:
        - name: all
          username: postgres
    healthchecks: https://hc-ping.com/$uuid
Uploading does not work, config inline (sensitive content replaced with dummy data): ``` location: source_directories: - . repositories: - ssh://host1/./backups/borg - ssh://host2/./borg exclude_patterns: - '*.pyc' - home/**/.cache - root/**/.cache - var/cache/* exclude_caches: true storage: encryption_passphrase: 1234 compression: lz4 lock_wait: 60 archive_name_format: '{hostname}-{now:%Y-%m-%dT%H:%M:%S}' retention: keep_within: 24H keep_daily: 30 keep_weekly: 4 keep_monthly: 22 keep_yearly: 8 prefix: '{hostname}-' consistency: checks: - repository - archives hooks: postgresql_databases: - name: all username: postgres healthchecks: https://hc-ping.com/$uuid ```
Author

Regarding your questions, a pg_dumpall command executed on the shell takes a couple of seconds, so we're not talking about a lot of data / a long runtime of the pg_dumpall command. Indeed, the backup runs into the runtime limit of 12h configured in systemd when I let it continue. And I see that the backup hangs at/after the openat syscall.
I have nuked /root/.borgmatic in between runs.

Regarding your questions, a pg_dumpall command executed on the shell takes a couple of seconds, so we're not talking about a lot of data / a long runtime of the `pg_dumpall` command. Indeed, the backup runs into the runtime limit of 12h configured in systemd when I let it continue. And I see that the backup hangs at/after the `openat` syscall. I have nuked `/root/.borgmatic` in between runs.
Author

Maybe there is problems in the consumer of the named pipe?

This backup config used to run, but stopped running shortly after a debian update from buster to bullseye. Might be correlation, might be causation.

Maybe there is problems in the consumer of the named pipe? This backup config used to run, but stopped running shortly after a debian update from buster to bullseye. Might be correlation, might be causation.
Author

For reference, there was an issue a while ago that had the same symptoms: #316

For reference, there was an issue a while ago that had the same symptoms: #316
Owner

Couple more questions on this:

  • Can I get a look at your borgmatic logs with --verbosity 2 on? Feel free to redact.
  • In particular, I'm interested in the full borg create command that shows up there, because I'm wondering if a source path is erroneously getting passed to Borg twice (which, as per #316, can cause the kind of hang you're seeing).
  • What's actually present in your . source directory? Does it contain the /root/.borgmatic path?
Couple more questions on this: * Can I get a look at your borgmatic logs with `--verbosity 2` on? Feel free to redact. * In particular, I'm interested in the full `borg create` command that shows up there, because I'm wondering if a source path is erroneously getting passed to Borg twice (which, as per #316, can cause the kind of hang you're seeing). * What's actually present in your `.` source directory? Does it contain the `/root/.borgmatic` path?
Author

First two questions I will come back to, but the answer to the third question is it's an lvm snapshot of /. Therefore it does not contain the /root/.borgmatic path that is actively used for the db hook AFAICS.

First two questions I will come back to, but the answer to the third question is it's an lvm snapshot of /. Therefore it does not contain the /root/.borgmatic path that is actively used for the db hook AFAICS.
Author

Did something change about database hook limitation #4 (read_special)?
Because the backups used to work before the system upgrade, but now they don't, because the snapshots contain a lot of special files.

I think the easiest solution could be to separate the database backup and the snapshot backup into two sets/configs.

Thanks for your guidance.

Did something change about database hook limitation #4 (read_special)? Because the backups used to work before the system upgrade, but now they don't, because the snapshots contain a lot of special files. I think the easiest solution could be to separate the database backup and the snapshot backup into two sets/configs. Thanks for your guidance.
Author

Separating the database backup from the snapshot backup works with 1.5.20. I sadly don't know which version of borgmatic was used before the upgrade (which worked backing up snapshot & database in one config).

If it helps, here's the create command from the verbose logs:

borg create --exclude-from /tmp/tmpxsft8_na --exclude-caches --compression lz4 --one-file-system --read-special --lock-wait 60 --debug --show-rc ssh://host1/./backups/borg::{hostname}-{now:%Y-%m-%dT%H:%M:%S} /root/.borgmatic

Feel free to close this or investigate further.

Separating the database backup from the snapshot backup works with 1.5.20. I sadly don't know which version of borgmatic was used before the upgrade (which worked backing up snapshot & database in one config). If it helps, here's the create command from the verbose logs: `borg create --exclude-from /tmp/tmpxsft8_na --exclude-caches --compression lz4 --one-file-system --read-special --lock-wait 60 --debug --show-rc ssh://host1/./backups/borg::{hostname}-{now:%Y-%m-%dT%H:%M:%S} /root/.borgmatic` Feel free to close this or investigate further.
Owner

The database streaming behavior was introduced in borgmatic 1.5.3 (including that read_special limitation) and there have been various fixes/tweaks since then. So my guess is that you were using a pre-1.5.3 version of borgmatic before your upgrade. Alternately, maybe you already had a 1.5.3+ version of borgmatic, but your system upgrade introduced some new special files that weren't present previously.

One "fix" you could make is to exclude those special files in borgmatic's configuration.

Closing this for now, but please feel free to continue the discussion here.

The database streaming behavior was introduced in borgmatic 1.5.3 (including that `read_special` limitation) and there have been various fixes/tweaks since then. So my guess is that you were using a pre-1.5.3 version of borgmatic before your upgrade. Alternately, maybe you already had a 1.5.3+ version of borgmatic, but your system upgrade introduced some new special files that weren't present previously. One "fix" you could make is to exclude those special files in borgmatic's configuration. Closing this for now, but please feel free to continue the discussion here.
witten added the
question / support
label 2021-12-06 19:38:50 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#468
No description provided.