Borgmatic takes a long time apparently doing practically nothing #583

Closed
opened 2022-09-09 05:43:03 +00:00 by apupappi · 7 comments

What I'm trying to do and why

I am running "borgmatic -v 2 create" to take a backup.

Steps to reproduce (if a bug)

I ran "borgmatic -v 2 create" with this config.yaml except not redacted.

location:

    source_directories:
        - /
        - /home
        - /var
        - /opt

    repositories:
        - ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo
        - ssh://xxxxxxxxxx@ma.sdf.org/./borg

    exclude_from:
        - /etc/borgmatic/excludes

    exclude_caches: true

    borgmatic_source_directory: /backup/root/dump

storage:

    encryption_passphrase: XXXXXXXXXXXXXXXXXXX

    borg_cache_directory: /backup/root/cache

    borg_keys_directory: /etc/borg/keys

retention:

    keep_within: 3H

    keep_daily: 7

    keep_weekly: 4

    keep_monthly: 6

    keep_yearly: 20

hooks:

    postgresql_databases:
        - name: all
          hostname: 127.0.0.1
          port: 5432
          username: _postgresql
          password: XXXXXXX

Actual behavior (if a bug)

I have tried this several times. The present process is still running, and
it has been about four days since anything has printed. Other times I have
done this, I have killed the process and seen that archives were created.
Below is output with --verbosity 2, redacted.

Ensuring legacy configuration is upgraded
/root/.config/borgmatic/config.yaml: No commands to run for pre-everything hook
borg --version --debug --show-rc
/root/.config/borgmatic/config.yaml: No commands to run for pre-actions hook
/root/.config/borgmatic/config.yaml: No commands to run for pre-backup hook
ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Creating archive
ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Calling postgresql_databases hook function remove_database_dumps
sh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Removing PostgreSQL database dumps
ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Calling postgresql_databases hook function dump_databases
ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Dumping PostgreSQL databases
ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Dumping PostgreSQL database all to /backup/root/dump/postgresql_databases/127.0.0.1/all
pg_dumpall --no-password --clean --if-exists --host 127.0.0.1 --port 5432 --username _postgresql > /backup/root/dump/postgresql_databases/127.0.0.1/all
borg create --exclude-from /etc/borgmatic/excludes --exclude-caches --one-file-system --read-special --debug --show-rc ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo::{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f} / /home /opt /var
using builtin fallback logging configuration
33 self tests completed in 0.39 seconds
SSH command line: ['ssh', 'xxxxxxxx@xxxxxxxx.repo.borgbase.com', 'borg', 'serve', '--debug']
Remote: using builtin fallback logging configuration
Remote: borg selftest disabled via BORG_SELFTEST env variable
Remote: using builtin fallback logging configuration
Remote: Initialized logging system for JSON-based protocol
Remote: Resolving repository path b'/./repo'
Remote: Resolved repository path to '/srv/repos/xxxxxxxx/repo'
Remote: Verified integrity of /srv/repos/xxxxxxxx/repo/index.1159
TAM-verified manifest
security: read previous location 'ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo'
security: read manifest timestamp '2022-09-05T12:18:49.275604'
security: determined newest manifest timestamp as 2022-09-05T12:18:49.275604
security: repository checks ok, allowing access
Creating archive at "ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo::xxxxxxx-2022-09-05T15:26:40.643276"
Verified integrity of /backup/root/cache/6a1cb756ba0ae7c7d428d85419e94096ffc204a851948e1e53157fd456edb33f/chunks
Reading files cache ...
Verified integrity of /backup/root/cache/6a1cb756ba0ae7c7d428d85419e94096ffc204a851948e1e53157fd456edb33f/files
security: read previous location 'ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo'
security: read manifest timestamp '2022-09-05T12:18:49.275604'
security: determined newest manifest timestamp as 2022-09-05T12:18:49.275604
security: repository checks ok, allowing access
Processing files ...
Remote: Cleaned up 1 uncommitted segment files (== everything after segment 1159).
Remote: Verified integrity of /srv/repos/xxxxxxxx/repo/hints.1159
Remote: Storage quota: 542.34 GB out of 8.00 TB used.

It seems to me that borgmatic creates an archive and then continues
running but does practically nothing.

Expected behavior (if a bug)

I expect this to run faster.

Other notes / implementation ideas

The trouble with backups taking this long is that I can run them only infrequently.

Environment

borgmatic version: 1.7.1

Use sudo borgmatic --version or sudo pip show borgmatic | grep ^Version

borgmatic installation method: pip

Borg version: 1.2.2

Use sudo borg --version

Python version: 3.9.13

Use python3 --version

Database version (if applicable): 14.5 on both server and client

Use psql --version or mysql --version on client and server.

operating system and version: OpenBSD 7.1

#### What I'm trying to do and why I am running "borgmatic -v 2 create" to take a backup. #### Steps to reproduce (if a bug) I ran "borgmatic -v 2 create" with this config.yaml except not redacted. ``` location: source_directories: - / - /home - /var - /opt repositories: - ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo - ssh://xxxxxxxxxx@ma.sdf.org/./borg exclude_from: - /etc/borgmatic/excludes exclude_caches: true borgmatic_source_directory: /backup/root/dump storage: encryption_passphrase: XXXXXXXXXXXXXXXXXXX borg_cache_directory: /backup/root/cache borg_keys_directory: /etc/borg/keys retention: keep_within: 3H keep_daily: 7 keep_weekly: 4 keep_monthly: 6 keep_yearly: 20 hooks: postgresql_databases: - name: all hostname: 127.0.0.1 port: 5432 username: _postgresql password: XXXXXXX ``` #### Actual behavior (if a bug) I have tried this several times. The present process is still running, and it has been about four days since anything has printed. Other times I have done this, I have killed the process and seen that archives were created. Below is output with `--verbosity 2`, redacted. ``` Ensuring legacy configuration is upgraded /root/.config/borgmatic/config.yaml: No commands to run for pre-everything hook borg --version --debug --show-rc /root/.config/borgmatic/config.yaml: No commands to run for pre-actions hook /root/.config/borgmatic/config.yaml: No commands to run for pre-backup hook ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Creating archive ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Calling postgresql_databases hook function remove_database_dumps sh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Removing PostgreSQL database dumps ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Calling postgresql_databases hook function dump_databases ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Dumping PostgreSQL databases ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo: Dumping PostgreSQL database all to /backup/root/dump/postgresql_databases/127.0.0.1/all pg_dumpall --no-password --clean --if-exists --host 127.0.0.1 --port 5432 --username _postgresql > /backup/root/dump/postgresql_databases/127.0.0.1/all borg create --exclude-from /etc/borgmatic/excludes --exclude-caches --one-file-system --read-special --debug --show-rc ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo::{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f} / /home /opt /var using builtin fallback logging configuration 33 self tests completed in 0.39 seconds SSH command line: ['ssh', 'xxxxxxxx@xxxxxxxx.repo.borgbase.com', 'borg', 'serve', '--debug'] Remote: using builtin fallback logging configuration Remote: borg selftest disabled via BORG_SELFTEST env variable Remote: using builtin fallback logging configuration Remote: Initialized logging system for JSON-based protocol Remote: Resolving repository path b'/./repo' Remote: Resolved repository path to '/srv/repos/xxxxxxxx/repo' Remote: Verified integrity of /srv/repos/xxxxxxxx/repo/index.1159 TAM-verified manifest security: read previous location 'ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo' security: read manifest timestamp '2022-09-05T12:18:49.275604' security: determined newest manifest timestamp as 2022-09-05T12:18:49.275604 security: repository checks ok, allowing access Creating archive at "ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo::xxxxxxx-2022-09-05T15:26:40.643276" Verified integrity of /backup/root/cache/6a1cb756ba0ae7c7d428d85419e94096ffc204a851948e1e53157fd456edb33f/chunks Reading files cache ... Verified integrity of /backup/root/cache/6a1cb756ba0ae7c7d428d85419e94096ffc204a851948e1e53157fd456edb33f/files security: read previous location 'ssh://xxxxxxxx@xxxxxxxx.repo.borgbase.com/./repo' security: read manifest timestamp '2022-09-05T12:18:49.275604' security: determined newest manifest timestamp as 2022-09-05T12:18:49.275604 security: repository checks ok, allowing access Processing files ... Remote: Cleaned up 1 uncommitted segment files (== everything after segment 1159). Remote: Verified integrity of /srv/repos/xxxxxxxx/repo/hints.1159 Remote: Storage quota: 542.34 GB out of 8.00 TB used. ``` It seems to me that borgmatic creates an archive and then continues running but does practically nothing. #### Expected behavior (if a bug) I expect this to run faster. #### Other notes / implementation ideas The trouble with backups taking this long is that I can run them only infrequently. #### Environment **borgmatic version:** 1.7.1 Use `sudo borgmatic --version` or `sudo pip show borgmatic | grep ^Version` **borgmatic installation method:** pip **Borg version:** 1.2.2 Use `sudo borg --version` **Python version:** 3.9.13 Use `python3 --version` **Database version (if applicable):** 14.5 on both server and client Use `psql --version` or `mysql --version` on client and server. **operating system and version:** OpenBSD 7.1
Owner

My guess is that this is a hang when attempting to dump the database, as Borg tends to hang if borgmatic hands it a named pipe that's inactive. (borgmatic uses named pipes to stream database dumps to Borg.)

So my first suggestion is to try commenting out the PostgreSQL database hook and see if that "fixes" the apparent hang problem. If so, then we know the problem is somewhere in that database dump/stream process.

I don't know what's in your excludes, but I see that / is in your source directories. Unless certain directories are explicitly excluded, that's almost certainly going to pass named pipes and special files to Borg that it can't read from, causing it to hang. (This only happens when database hooks are enabled or you use read_special explicitly.)

If that appears to be the case, my recommendation is to take a look at the borgmatic databases documentation, specifically the limitations, and see if you can find what paths in / are causing problems. Then, exclude them! Alternatively, just exclude common problem directories like /dev and /run and go from there.

Let me know what you find!

My guess is that this is a hang when attempting to dump the database, as Borg tends to hang if borgmatic hands it a named pipe that's inactive. (borgmatic uses named pipes to stream database dumps to Borg.) So my first suggestion is to try commenting out the PostgreSQL database hook and see if that "fixes" the apparent hang problem. If so, then we know the problem is somewhere in that database dump/stream process. I don't know what's in your excludes, but I see that `/` is in your source directories. Unless certain directories are explicitly excluded, that's almost certainly going to pass named pipes and special files to Borg that it can't read from, causing it to hang. (This only happens when database hooks are enabled or you use `read_special` explicitly.) If that appears to be the case, my recommendation is to take a look at the [borgmatic databases documentation, specifically the limitations](https://torsion.org/borgmatic/docs/how-to/backup-your-databases/#limitations), and see if you can find what paths in `/` are causing problems. Then, exclude them! Alternatively, just exclude common problem directories like `/dev` and `/run` and go from there. Let me know what you find!
witten added the
question / support
label 2022-09-09 06:44:59 +00:00
Author

I ran find / -type c,b,p,s, and the results were all among the excludes.

I ran without the postgresql hook, and the process finished.

I'm now running without the postgresql hook but with read_special true, to confirm that read_special is the cause.

I ran `find / -type c,b,p,s`, and the results were all among the excludes. I ran without the postgresql hook, and the process finished. I'm now running without the postgresql hook but with read_special true, to confirm that read_special is the cause.
Owner

In terms of debugging, that sounds like a good course of action. One other thing you could maybe try is adding the --files flag to the borgmatic invocation, so that Borg lists out files as it's backing up. I don't recall whether Borg lists the files before or after it completes backing it up, but the output might at least give you an idea of approximately where in the filesystem something is going wrong, if not the individual problem file.

Edit: One more idea.. You could try running Borg with strace, which will spew a ton of output, but should tell you what exactly it was doing right before it hung—and possibly what file it was reading from. The easiest way to do that would probably be to make an executable script that looks something like the following (untested):

#!/bin/bash

strace borg $*

And then set the borgmatic local_path option to the path of that script so it gets called by borgmatic instead of Borg directly.

Edit 2: Have you tried just nuking the entire contents of /backup/root/dump in case there are some stale named pipes hanging around in there? There shouldn't be, but there might be anyway.

In terms of debugging, that sounds like a good course of action. One other thing you could maybe try is adding the `--files` flag to the borgmatic invocation, so that Borg lists out files as it's backing up. I don't recall whether Borg lists the files before or after it completes backing it up, but the output might at least give you an idea of approximately where in the filesystem something is going wrong, if not the individual problem file. Edit: One more idea.. You could try running Borg with [`strace`](https://strace.io/), which will spew a ton of output, but should tell you what exactly it was doing right before it hung—and possibly what file it was reading from. The easiest way to do that would probably be to make an executable script that looks something like the following (untested): ```bash #!/bin/bash strace borg $* ``` And then set the borgmatic `local_path` option to the path of that script so it gets called by borgmatic instead of Borg directly. Edit 2: Have you tried just nuking the entire contents of `/backup/root/dump` in case there are some stale named pipes hanging around in there? There shouldn't be, but there [might be anyway](https://projects.torsion.org/borgmatic-collective/borgmatic/issues/360).
Author

I ran find / -type c,b,p,s, and the results were all among the excludes.

I ran without the postgresql hook, and the process finished.

I'm now running without the postgresql hook but with read_special true, to confirm that read_special is the cause.

And indeed it was still running.

I did not find --files, but I used --progress and found the output has stayed like this for a while.

8.96 GB O 5.52 GB C 126 B D 236763 N sbin/unwind

I don't know what exactly the file name indicates on that line, but I figured it is either the file that has just been handled or is about to be handled. The unwind and the adjacent files in alphabetical order are regular files.

# ls -Fhl /sbin/ | tail -n 5
-r-xr-xr-x  1 root  bin        130K Apr 11 23:46 tunefs*
-r-xr-xr-x  1 root  bin        186K Apr 11 23:46 umount*
-r-xr-xr-x  1 root  bin        3.5M Apr 11 23:46 unwind*
-r-xr-xr-x  1 root  bin        209K Apr 11 23:46 vnconfig*
-r-xr-xr-x  1 root  bin        254K Apr 11 23:46 wsconsctl*

So I added my own path-printing line to borg, and I found the troublesome file to be /backup/root/dump/postgresql_databases/127.0.0.1/all, which is of course a FIFO that I neglected to delete after killing borgmatic.

On the file order, in case it matters:
This file's real name comes right after /sbin in alphabetical order, but I removed some private information from this path and from the borgmatic source directory name before sending the config.yaml earlier.

So then I deleted /backup/root/dump/postgresql_databases/127.0.0.1/all and ran borg again

This demonstrates that the find parameter in the documentation don't match a FIFO, so I tried find / -type b -or -type c -or -type p -or -type s and found some FIFOs. I have found this command to work on NetBSD, OpenBSD, and GNU.

It seems to me that find / -type b,c,p,s in OpenBSD and NetBSD is equivalent to find / -type b. (And perhaps the lack of error should be considered a bug.)

> I ran `find / -type c,b,p,s`, and the results were all among the excludes. > > I ran without the postgresql hook, and the process finished. > > I'm now running without the postgresql hook but with read_special true, to confirm that read_special is the cause. And indeed it was still running. I did not find `--files`, but I used `--progress` and found the output has stayed like this for a while. ``` 8.96 GB O 5.52 GB C 126 B D 236763 N sbin/unwind ``` I don't know what exactly the file name indicates on that line, but I figured it is either the file that has just been handled or is about to be handled. The `unwind` and the adjacent files in alphabetical order are regular files. ``` # ls -Fhl /sbin/ | tail -n 5 -r-xr-xr-x 1 root bin 130K Apr 11 23:46 tunefs* -r-xr-xr-x 1 root bin 186K Apr 11 23:46 umount* -r-xr-xr-x 1 root bin 3.5M Apr 11 23:46 unwind* -r-xr-xr-x 1 root bin 209K Apr 11 23:46 vnconfig* -r-xr-xr-x 1 root bin 254K Apr 11 23:46 wsconsctl* ``` So I added my own path-printing line to borg, and I found the troublesome file to be `/backup/root/dump/postgresql_databases/127.0.0.1/all`, which is of course a FIFO that I neglected to delete after killing borgmatic. On the file order, in case it matters: This file's real name comes right after `/sbin` in alphabetical order, but I removed some private information from this path and from the borgmatic source directory name before sending the `config.yaml` earlier. So then I deleted `/backup/root/dump/postgresql_databases/127.0.0.1/all` and ran borg again This demonstrates that the `find` parameter in the documentation don't match a FIFO, so I tried `find / -type b -or -type c -or -type p -or -type s` and found some FIFOs. I have found this command to work on NetBSD, OpenBSD, and GNU. It seems to me that `find / -type b,c,p,s` in OpenBSD and NetBSD is equivalent to `find / -type b`. (And perhaps the lack of error should be considered a bug.)
Owner

Nice work debugging this! I'll update the documentation with your modified find command. And it sounds like the underlying issue (borgmatic sometimes leaving named pipes laying around when killed and not cleaning them up on startup) is the same as issue #360, which I should really get around to. So I'll close this ticket in favor of that. Thanks!

Nice work debugging this! I'll update the documentation with your modified `find` command. And it sounds like the underlying issue (borgmatic sometimes leaving named pipes laying around when killed and not cleaning them up on startup) is the same as issue #360, which I should *really* get around to. So I'll close this ticket in favor of that. Thanks!
Author

it sounds like the underlying issue (borgmatic sometimes leaving named pipes laying around when killed and not cleaning them up on startup) is the same as issue #360

It is too early to say what the underlying issue is, since it will take me reverting all the settings and running a backup to see. And of course the concept of one underlying issue is limiting.

But I am confident they are not the same. In my case I think one underlying issue was the GNU-specific find command in the documentation, and another could be the lack of error message from the borgmatic command. I doubt the problematic FIFO was originally the PostgreSQL dump FIFO, since I expect borgmatic to handle it. When I changed the find command, the FIFOs that I found were ii channels, so, when I exclude those, I expect the backup to run differently.

> it sounds like the underlying issue (borgmatic sometimes leaving named pipes laying around when killed and not cleaning them up on startup) is the same as issue #360 It is too early to say what the underlying issue is, since it will take me reverting all the settings and running a backup to see. And of course the concept of one underlying issue is limiting. But I am confident they are not the same. In my case I think one underlying issue was the GNU-specific find command in the documentation, and another could be the lack of error message from the borgmatic command. I doubt the problematic FIFO was originally the PostgreSQL dump FIFO, since I expect borgmatic to handle it. When I changed the find command, the FIFOs that I found were [ii](https://tools.suckless.org/ii/) channels, so, when I exclude those, I expect the backup to run differently.
Owner

Okay, understood! Feel free to post any of your follow-up findings here, and I'd be happy to reopen the ticket if there's still an issue distinct from #360 (and not solved by the correct find command you've helpfully provided).

Okay, understood! Feel free to post any of your follow-up findings here, and I'd be happy to reopen the ticket if there's still an issue distinct from #360 (and not solved by the correct `find` command you've helpfully provided).
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#583
No description provided.