pg_dumpall hangs #316

Closed
opened 2020-05-19 13:11:41 +00:00 by bjo · 12 comments

What I'm trying to do and why

Steps to reproduce (if a bug)

run borgmatic with the attached config

Actual behavior (if a bug)

/root/.borgmatic/postgresql_databases/localhost/all stays empty:

ls -al
total 0
drwx------ 2 root root 17 May 19 14:59 .
drwxr-xr-x 3 root root 23 May 19 11:07 ..
prw------- 1 root root  0 May 19 14:59 all

borgmatic hangs waiting for the file:

strace: Process 773093 attached
openat(AT_FDCWD, "/root/.borgmatic/postgresql_databases/localhost/all", O_RDONLY|O_NOATIME|O_CLOEXEC

Running pg_dumpall --no-password --clean --if-exists --username postgres > /root/.borgmatic/postgresql_databases/localhost/all works fine.

Expected behavior (if a bug)

A dump in /root/.borgmatic/postgresql_databases/localhost/all which gets backed up.

Environment

borgmatic version: 1.5.4

borgmatic installation method: Arch Linux package

Borg version: 1.1.11

Python version: 3.8.2

operating system and version: Arch Linux

PostgreSQL version: 12.3

#### What I'm trying to do and why #### Steps to reproduce (if a bug) run borgmatic with the attached config #### Actual behavior (if a bug) /root/.borgmatic/postgresql_databases/localhost/all stays empty: ``` ls -al total 0 drwx------ 2 root root 17 May 19 14:59 . drwxr-xr-x 3 root root 23 May 19 11:07 .. prw------- 1 root root 0 May 19 14:59 all ``` borgmatic hangs waiting for the file: ``` strace: Process 773093 attached openat(AT_FDCWD, "/root/.borgmatic/postgresql_databases/localhost/all", O_RDONLY|O_NOATIME|O_CLOEXEC ``` Running `pg_dumpall --no-password --clean --if-exists --username postgres > /root/.borgmatic/postgresql_databases/localhost/all` works fine. #### Expected behavior (if a bug) A dump in /root/.borgmatic/postgresql_databases/localhost/all which gets backed up. #### Environment **borgmatic version:** 1.5.4 **borgmatic installation method:** Arch Linux package **Borg version:** 1.1.11 **Python version:** 3.8.2 **operating system and version:** Arch Linux **PostgreSQL version:** 12.3
Owner

Thanks for bringing this to my attention, and including the strace! Looks like the config file isn't attached.. Would you mind trying attaching it again, or failing that, linking to it on a paste service or another site?

Some background: Because borgmatic now uses named pipes to transfer data from pg_dump/pg_dumpall directly to borg, I would expect the "dump" file size to remain zero regardless of whether things are working there.

A couple of ideas to help debug this:

  1. Try deleting /root/.borgmatic/postgresql_databases/localhost/all manually and then running borgmatic again. There's code in borgmatic that should in theory do that already, but I suppose it's possible something is going wrong, and there's a stale named pipe from a previous run that's hanging.
  2. On that theme, ctrl-C borgmatic, and check your process list to see if a stale pg_dumpall or borg process is running.
  3. In your borgmatic logs, is there a borg create command that ran? Does the timing of the strace openat() appear to correlate with borg create, or does it come earlier? I assume you're strace-ing the borgmatic process rather than the Borg process? It might be helpful to see the full output of your borgmatic logs (redacted if needed).
Thanks for bringing this to my attention, and including the strace! Looks like the config file isn't attached.. Would you mind trying attaching it again, or failing that, linking to it on a paste service or another site? Some background: Because borgmatic now uses named pipes to transfer data from `pg_dump`/`pg_dumpall` directly to `borg`, I would expect the "dump" file size to remain zero regardless of whether things are working there. A couple of ideas to help debug this: 1. Try deleting `/root/.borgmatic/postgresql_databases/localhost/all` manually and then running borgmatic again. There's code in borgmatic that should in theory do that already, but I suppose it's possible something is going wrong, and there's a stale named pipe from a previous run that's hanging. 2. On that theme, ctrl-C borgmatic, and check your process list to see if a stale `pg_dumpall` or `borg` process is running. 3. In your borgmatic logs, is there a `borg create` command that ran? Does the timing of the strace `openat()` appear to correlate with `borg create`, or does it come earlier? I assume you're strace-ing the borgmatic process rather than the Borg process? It might be helpful to see the full output of your borgmatic logs (redacted if needed).
witten added the
bug
label 2020-05-19 17:14:27 +00:00
Author

Ha, I thought the yaml got uploaded but it was rejected. Pasted at https://dpaste.org/QOYw now.
The strace is from the borg create process, the borgmatic process is at

wait4(1069992, 0x7ffcc1c14984, WNOHANG, NULL) = 0
wait4(1069992, 0x7ffcc1c147c4, WNOHANG, NULL) = 0
select(9, [6 8], [], [], NULL)          = 1 (in [6])
read(6, "", 4096)                       = 0
wait4(1069992, 0x7ffcc1c14984, WNOHANG, NULL) = 0
wait4(1069992, 0x7ffcc1c147c4, WNOHANG, NULL) = 0

Cleaning /root/.borgmatic/postgresql_databases/localhost/ did not help, so it's reproducable, the log is pasted here https://dpaste.org/PEcz

Just an idea:
/root/.borgmatic is included twice, first as / and then explicitely as /root/.borgmatic for the PostgreSQL backup via pipe. Does this lead to this issue? There's no dangling pg_dumpall process, so the pg_dumpall dump is piped one time, but afterwards the pipe still exists, so borg waits still for new input?

Ha, I thought the yaml got uploaded but it was rejected. Pasted at https://dpaste.org/QOYw now. The strace is from the `borg create` process, the `borgmatic` process is at ``` wait4(1069992, 0x7ffcc1c14984, WNOHANG, NULL) = 0 wait4(1069992, 0x7ffcc1c147c4, WNOHANG, NULL) = 0 select(9, [6 8], [], [], NULL) = 1 (in [6]) read(6, "", 4096) = 0 wait4(1069992, 0x7ffcc1c14984, WNOHANG, NULL) = 0 wait4(1069992, 0x7ffcc1c147c4, WNOHANG, NULL) = 0 ``` Cleaning `/root/.borgmatic/postgresql_databases/localhost/` did not help, so it's reproducable, the log is pasted here https://dpaste.org/PEcz Just an idea: `/root/.borgmatic` is included twice, first as `/` and then explicitely as `/root/.borgmatic` for the PostgreSQL backup via pipe. Does this lead to this issue? There's no dangling pg_dumpall process, so the pg_dumpall dump is piped one time, but afterwards the pipe still exists, so borg waits still for new input?
Owner

Just an idea: /root/.borgmatic is included twice, first as / and then explicitely as /root/.borgmatic for the PostgreSQL backup via pipe. Does this lead to this issue? There's no dangling pg_dumpall process, so the pg_dumpall dump is piped one time, but afterwards the pipe still exists, so borg waits still for new input?

Good find! I think you are absolutely right, and I'm thankfully able to reproduce the hang here by specifying a source directory like /root that implicitly re-includes /root/.borgmatic.

The super odd thing is that Borg does appear to be doing some file-level deduplication. E.g. in a test without named pipes, if I specify both "/foo" and "/foo/bar" as source directories, then a file at "/foo/bar/baz.txt" will only get backed up once as judged by the --files listing. But Borg must be trying to read that file twice somehow, resulting in the hang you're seeing.

> Just an idea: /root/.borgmatic is included twice, first as / and then explicitely as /root/.borgmatic for the PostgreSQL backup via pipe. Does this lead to this issue? There's no dangling pg_dumpall process, so the pg_dumpall dump is piped one time, but afterwards the pipe still exists, so borg waits still for new input? Good find! I think you are absolutely right, and I'm thankfully able to reproduce the hang here by specifying a source directory like `/root` that implicitly re-includes `/root/.borgmatic`. The super odd thing is that Borg does appear to be doing some file-level deduplication. E.g. in a test without named pipes, if I specify both "/foo" and "/foo/bar" as source directories, then a file at "/foo/bar/baz.txt" will only get backed up once as judged by the `--files` listing. But Borg must be trying to *read* that file twice somehow, resulting in the hang you're seeing.
Owner

Well, I have a solution. I'm not sure it's a good solution, but it has the distinct benefit of appearing to work with some basic prototyping..

Basically, before feeding source directories to Borg, borgmatic can deduplicate them. In other words, if / and /root and /root/.borgmatic are all in the source directories list, borgmatic would realize that only / should be fed to Borg as a source path.

My rationalization is that borgmatic is already doing preprocessing of directories it's feeding to Borg (expanding tildes, etc.), so it's not too much to expect it also to deduplicate child directories.

Well, I have a solution. I'm not sure it's a good solution, but it has the distinct benefit of appearing to work with some basic prototyping.. Basically, before feeding source directories to Borg, borgmatic can deduplicate them. In other words, if `/` and `/root` and `/root/.borgmatic` are all in the source directories list, borgmatic would realize that only `/` should be fed to Borg as a source path. My rationalization is that borgmatic is already doing preprocessing of directories it's feeding to Borg (expanding tildes, etc.), so it's not too much to expect it also to deduplicate child directories.
Owner

Okay, I made that change in master, and it appears to fix the issue. Thanks again for bringing it up and doing all the debugging on your end to make this easy to diagnose!

Closing this ticket, and I'll add a note here when the change makes its way into the next release.

Okay, I made that change in master, and it appears to fix the issue. Thanks again for bringing it up and doing all the debugging on your end to make this easy to diagnose! Closing this ticket, and I'll add a note here when the change makes its way into the next release.
Owner

Released in borgmatic 1.5.5!

Released in borgmatic 1.5.5!

I'm not sure this is fixed.

I'm backing up a mysql server, and just /etc (to remove the /root issue)

I check top, I see mysqldump and borg running hard. Once they are done,
it prints out
A /root/.borgmatic/mysql_databases/localhost/mydb
and then borgmatic spins the CPU hard forever.

Backtrace:

^CTraceback (most recent call last):
  File "/root/.local/bin/borgmatic", line 10, in <module>
    sys.exit(main())
  File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 718, in main
    summary_logs = parse_logs + list(collect_configuration_run_summary_logs(configs, arguments))
  File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 616, in collect_configuration_run_summary_logs
    results = list(run_configuration(config_filename, config, arguments))
  File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 115, in run_configuration
    repository_path=repository_path,
  File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 286, in run_actions
    stream_processes=stream_processes,
  File "/root/.local/lib/python3.7/site-packages/borgmatic/borg/create.py", line 258, in create_archive
    borg_local_path=local_path,
  File "/root/.local/lib/python3.7/site-packages/borgmatic/execute.py", line 256, in execute_command_with_processes
    borg_local_path=borg_local_path,
  File "/root/.local/lib/python3.7/site-packages/borgmatic/execute.py", line 91, in log_outputs
    if exit_code_indicates_error(process, exit_code, borg_local_path):
  File "/root/.local/lib/python3.7/site-packages/borgmatic/execute.py", line 28, in exit_code_indicates_error
    return bool(exit_code != 0)
KeyboardInterrupt

It appears to be spinning in execute.py,
ie exit_code = process.poll() if output_buffers else process.wait()

I'm assuming its not detecting that the dump has completed?

I'm not sure this is fixed. I'm backing up a mysql server, and just /etc (to remove the /root issue) I check top, I see mysqldump and borg running hard. Once they are done, it prints out A /root/.borgmatic/mysql_databases/localhost/mydb and then borgmatic spins the CPU hard forever. Backtrace: ``` ^CTraceback (most recent call last): File "/root/.local/bin/borgmatic", line 10, in <module> sys.exit(main()) File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 718, in main summary_logs = parse_logs + list(collect_configuration_run_summary_logs(configs, arguments)) File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 616, in collect_configuration_run_summary_logs results = list(run_configuration(config_filename, config, arguments)) File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 115, in run_configuration repository_path=repository_path, File "/root/.local/lib/python3.7/site-packages/borgmatic/commands/borgmatic.py", line 286, in run_actions stream_processes=stream_processes, File "/root/.local/lib/python3.7/site-packages/borgmatic/borg/create.py", line 258, in create_archive borg_local_path=local_path, File "/root/.local/lib/python3.7/site-packages/borgmatic/execute.py", line 256, in execute_command_with_processes borg_local_path=borg_local_path, File "/root/.local/lib/python3.7/site-packages/borgmatic/execute.py", line 91, in log_outputs if exit_code_indicates_error(process, exit_code, borg_local_path): File "/root/.local/lib/python3.7/site-packages/borgmatic/execute.py", line 28, in exit_code_indicates_error return bool(exit_code != 0) KeyboardInterrupt ``` It appears to be spinning in execute.py, ie exit_code = process.poll() if output_buffers else process.wait() I'm assuming its not detecting that the dump has completed?
Owner

Ugh! Can I see the borg create invocation from the verbose output to make sure that the source directiories are getting de-duplicated correctly?

Also, can you try rm -fr /root/.borgmatic, and then running borgmatic again to see if that makes the hang go away? If it does, then there's probably a stale named pipe sitting around. (Which would indicate borgmatic needs to do further cleanup before running borg create.)

Ugh! Can I see the `borg create` invocation from the verbose output to make sure that the source directiories are getting de-duplicated correctly? Also, can you try `rm -fr /root/.borgmatic`, and then running borgmatic again to see if that makes the hang go away? If it does, then there's probably a stale named pipe sitting around. (Which would indicate borgmatic needs to do further cleanup before running `borg create`.)
witten reopened this issue 2020-05-29 17:41:38 +00:00

Note that I'm using the latest git version (as of yesterday) for this debugging.


While I'm here:
Note that if you use --file instead of --files, there is no error message warning about eg unknown option.


Also note that I added --single-transaction to mysql options, I think that would be good to have in the generated options file, it allows InnoDB databases to be backed up without locking (which also requires extra permissions for the backup user)


And in the config.yaml file, I was confused in the mysql area because the comments abov "# - name: users" was at the same tab level as the following comments. The "# Database name (requir..." comments should be a few spaces to the left, to make clear that the following options (like hostname:) are suboptions to the -name bit.


From the verbose log:

mysqldump --add-drop-database --user rsn_backup --single-transaction --databases regional > /root/.borgmatic/mysql_databases/localhost/regional
borg create --exclude-from /tmp/tmpvszkwykh --compression auto,zstd,10 --one-file-system --read-special --list --filter AME- --debug --show-rc SERVER:repo::{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f} /etc /root/.borgmatic

Note that I disabled all the other paths, except for /etc, so dedup of the paths should not enter into the equation at all.


This is how it ends:

Remote: Cleaned up 0 uncommitted segment files (== everything after segment 44).
Remote: Verified integrity of /srv/repos/REPONAME/repo/hints.44
Remote: Storage quota: 906.76 MB out of 3.81 TB used.
A /root/.borgmatic/mysql_databases/localhost/regional
AND THEN SPINS THE CPU FOREVER (borg and mysql/mysqldump finish)

Removing /root/.borgmatic worked! (I only had to rm -r, no -f)
It run until completion.

Before the delete and run:

# find /root/.borgmatic/                                                                                                                                               
/root/.borgmatic/                                                                                                                                                                                          
/root/.borgmatic/mysql_databases                                                                                                                                                                            
/root/.borgmatic/mysql_databases/localhost
/root/.borgmatic/mysql_databases/localhost/regional
/root/.borgmatic/mysql_databases/localhost/all

After the run:

# find /root/.borgmatic/                                                                                                                                                
/root/.borgmatic/                         
/root/.borgmatic/mysql_databases

So the stale pipes idea was correct!


I tried to restore the database, but hit a couple of problems:

  1. The backup mysql user only has permissions to read, not to create databases. There should be a way to specify the user to execute the restore.
  2. I wanted to test the restore by restoring to another database (not the original name). That would be useful for checking values from the past, for example.
  3. Alternatively, I wanted to extract the dump file only. In the past I've had to modify the dump file before restoration (new features in MySQL not supported by the dump, or the DEFINER user doesn't exist).
  4. I used --verbosity to find the command.

That's all the comments I have about that for now, sorry to hijack the thread (it was faster that way), feel free to break it up into a new thread if you want.

Great work on borgmatic!

Note that I'm using the latest git version (as of yesterday) for this debugging. ----- While I'm here: Note that if you use --file instead of --files, there is no error message warning about eg unknown option. ----- Also note that I added --single-transaction to mysql options, I think that would be good to have in the generated options file, it allows InnoDB databases to be backed up without locking (which also requires extra permissions for the backup user) ----- And in the config.yaml file, I was confused in the mysql area because the comments abov "# - name: users" was at the same tab level as the following comments. The "# Database name (requir..." comments should be a few spaces to the left, to make clear that the following options (like hostname:) are suboptions to the -name bit. ----- From the verbose log: ``` mysqldump --add-drop-database --user rsn_backup --single-transaction --databases regional > /root/.borgmatic/mysql_databases/localhost/regional borg create --exclude-from /tmp/tmpvszkwykh --compression auto,zstd,10 --one-file-system --read-special --list --filter AME- --debug --show-rc SERVER:repo::{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f} /etc /root/.borgmatic ``` Note that I disabled all the other paths, except for /etc, so dedup of the paths should not enter into the equation at all. ----- This is how it ends: ``` Remote: Cleaned up 0 uncommitted segment files (== everything after segment 44). Remote: Verified integrity of /srv/repos/REPONAME/repo/hints.44 Remote: Storage quota: 906.76 MB out of 3.81 TB used. A /root/.borgmatic/mysql_databases/localhost/regional AND THEN SPINS THE CPU FOREVER (borg and mysql/mysqldump finish) ``` ----- Removing /root/.borgmatic worked! (I only had to rm -r, no -f) It run until completion. Before the delete and run: ``` # find /root/.borgmatic/ /root/.borgmatic/ /root/.borgmatic/mysql_databases /root/.borgmatic/mysql_databases/localhost /root/.borgmatic/mysql_databases/localhost/regional /root/.borgmatic/mysql_databases/localhost/all ``` After the run: ``` # find /root/.borgmatic/ /root/.borgmatic/ /root/.borgmatic/mysql_databases ``` So the stale pipes idea was correct! ----- I tried to restore the database, but hit a couple of problems: 1. The backup mysql user only has permissions to read, not to create databases. There should be a way to specify the user to execute the restore. 2. I wanted to test the restore by restoring to another database (not the original name). That would be useful for checking values from the past, for example. 3. Alternatively, I wanted to extract the dump file only. In the past I've had to modify the dump file before restoration (new features in MySQL not supported by the dump, or the DEFINER user doesn't exist). 4. I used --verbosity to find the command. ----- That's all the comments I have about that for now, sorry to hijack the thread (it was faster that way), feel free to break it up into a new thread if you want. Great work on borgmatic!
Owner

While I'm here: Note that if you use --file instead of --files, there is no error message warning about eg unknown option.

I think that's because --file is accepted as a shorthand for --files! Not sure why. Maybe it's a Python thing with plural options.

Also note that I added --single-transaction to mysql options, I think that would be good to have in the generated options file, it allows InnoDB databases to be backed up without locking (which also requires extra permissions for the backup user)

What do you mean in the generated options file? Do you mean as a formal borgmatic configuration file option for the MySQL hook?

And in the config.yaml file, I was confused in the mysql area because the comments abov “# - name: users” was at the same tab level as the following comments. The “# Database name (requir...” comments should be a few spaces to the left, to make clear that the following options (like hostname:) are suboptions to the -name bit.

Good call. Fixed in master!

So the stale pipes idea was correct!

Great! I'll leave this ticket open to cover borgmatic more aggressively deleting stale pipes from previous runs before running any database dumps.

I tried to restore the database, but hit a couple of problems:

The backup mysql user only has permissions to read, not to create databases. There should be a way to specify the user to execute the restore.

I think should be covered by the (not yet implemented) #309. In any case, I'll expand that ticket to cover this use case. Feel free to follow along there.

I wanted to test the restore by restoring to another database (not the original name). That would be useful for checking values from the past, for example.

I filed this as #322.

Alternatively, I wanted to extract the dump file only. In the past I've had to modify the dump file before restoration (new features in MySQL not supported by the dump, or the DEFINER user doesn't exist).
I used --verbosity to find the command.

That works too!

That's all the comments I have about that for now, sorry to hijack the thread (it was faster that way), feel free to break it up into a new thread if you want.

No worries. Better to get it all down than not at all.. I appreciate all the feedback.

> While I'm here: Note that if you use --file instead of --files, there is no error message warning about eg unknown option. I think that's because `--file` is accepted as a shorthand for `--files`! Not sure why. Maybe it's a Python thing with plural options. > Also note that I added --single-transaction to mysql options, I think that would be good to have in the generated options file, it allows InnoDB databases to be backed up without locking (which also requires extra permissions for the backup user) What do you mean in the generated options file? Do you mean as a formal borgmatic configuration file option for the MySQL hook? > And in the config.yaml file, I was confused in the mysql area because the comments abov “# - name: users” was at the same tab level as the following comments. The “# Database name (requir...” comments should be a few spaces to the left, to make clear that the following options (like hostname:) are suboptions to the -name bit. Good call. Fixed in master! > So the stale pipes idea was correct! Great! I'll leave this ticket open to cover borgmatic more aggressively deleting stale pipes from previous runs before running any database dumps. > I tried to restore the database, but hit a couple of problems: > > The backup mysql user only has permissions to read, not to create databases. There should be a way to specify the user to execute the restore. I think should be covered by the (not yet implemented) #309. In any case, I'll expand that ticket to cover this use case. Feel free to follow along there. > I wanted to test the restore by restoring to another database (not the original name). That would be useful for checking values from the past, for example. I filed this as #322. > Alternatively, I wanted to extract the dump file only. In the past I've had to modify the dump file before restoration (new features in MySQL not supported by the dump, or the DEFINER user doesn't exist). > I used --verbosity to find the command. That works too! > That's all the comments I have about that for now, sorry to hijack the thread (it was faster that way), feel free to break it up into a new thread if you want. No worries. Better to get it all down than not at all.. I appreciate all the feedback.
Owner

Okay, I just pushed a change in which borgmatic removes the entire database dump path, including all contained dump named pipes, regardless of whether they're currently configured databases. That should prevent Borg from hanging on a stale named pipe that will never receive any data.

Okay, I just pushed a change in which borgmatic removes the entire database dump path, including all contained dump named pipes, regardless of whether they're currently configured databases. That should prevent Borg from hanging on a stale named pipe that will never receive any data.
Owner

This is released in borgmatic 1.5.6. Please let me know if you find any other issues. And I could use some more feedback on #321 when you get a chance!

This is released in borgmatic 1.5.6. Please let me know if you find any other issues. And I could use some more feedback on #321 when you get a chance!
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#316
No description provided.