pg_dumpall hangs #316
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#316
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Steps to reproduce (if a bug)
run borgmatic with the attached config
Actual behavior (if a bug)
/root/.borgmatic/postgresql_databases/localhost/all stays empty:
borgmatic hangs waiting for the file:
Running
pg_dumpall --no-password --clean --if-exists --username postgres > /root/.borgmatic/postgresql_databases/localhost/all
works fine.Expected behavior (if a bug)
A dump in /root/.borgmatic/postgresql_databases/localhost/all which gets backed up.
Environment
borgmatic version: 1.5.4
borgmatic installation method: Arch Linux package
Borg version: 1.1.11
Python version: 3.8.2
operating system and version: Arch Linux
PostgreSQL version: 12.3
Thanks for bringing this to my attention, and including the strace! Looks like the config file isn't attached.. Would you mind trying attaching it again, or failing that, linking to it on a paste service or another site?
Some background: Because borgmatic now uses named pipes to transfer data from
pg_dump
/pg_dumpall
directly toborg
, I would expect the "dump" file size to remain zero regardless of whether things are working there.A couple of ideas to help debug this:
/root/.borgmatic/postgresql_databases/localhost/all
manually and then running borgmatic again. There's code in borgmatic that should in theory do that already, but I suppose it's possible something is going wrong, and there's a stale named pipe from a previous run that's hanging.pg_dumpall
orborg
process is running.borg create
command that ran? Does the timing of the straceopenat()
appear to correlate withborg create
, or does it come earlier? I assume you're strace-ing the borgmatic process rather than the Borg process? It might be helpful to see the full output of your borgmatic logs (redacted if needed).Ha, I thought the yaml got uploaded but it was rejected. Pasted at https://dpaste.org/QOYw now.
The strace is from the
borg create
process, theborgmatic
process is atCleaning
/root/.borgmatic/postgresql_databases/localhost/
did not help, so it's reproducable, the log is pasted here https://dpaste.org/PEczJust an idea:
/root/.borgmatic
is included twice, first as/
and then explicitely as/root/.borgmatic
for the PostgreSQL backup via pipe. Does this lead to this issue? There's no dangling pg_dumpall process, so the pg_dumpall dump is piped one time, but afterwards the pipe still exists, so borg waits still for new input?Good find! I think you are absolutely right, and I'm thankfully able to reproduce the hang here by specifying a source directory like
/root
that implicitly re-includes/root/.borgmatic
.The super odd thing is that Borg does appear to be doing some file-level deduplication. E.g. in a test without named pipes, if I specify both "/foo" and "/foo/bar" as source directories, then a file at "/foo/bar/baz.txt" will only get backed up once as judged by the
--files
listing. But Borg must be trying to read that file twice somehow, resulting in the hang you're seeing.Well, I have a solution. I'm not sure it's a good solution, but it has the distinct benefit of appearing to work with some basic prototyping..
Basically, before feeding source directories to Borg, borgmatic can deduplicate them. In other words, if
/
and/root
and/root/.borgmatic
are all in the source directories list, borgmatic would realize that only/
should be fed to Borg as a source path.My rationalization is that borgmatic is already doing preprocessing of directories it's feeding to Borg (expanding tildes, etc.), so it's not too much to expect it also to deduplicate child directories.
Okay, I made that change in master, and it appears to fix the issue. Thanks again for bringing it up and doing all the debugging on your end to make this easy to diagnose!
Closing this ticket, and I'll add a note here when the change makes its way into the next release.
Released in borgmatic 1.5.5!
I'm not sure this is fixed.
I'm backing up a mysql server, and just /etc (to remove the /root issue)
I check top, I see mysqldump and borg running hard. Once they are done,
it prints out
A /root/.borgmatic/mysql_databases/localhost/mydb
and then borgmatic spins the CPU hard forever.
Backtrace:
It appears to be spinning in execute.py,
ie exit_code = process.poll() if output_buffers else process.wait()
I'm assuming its not detecting that the dump has completed?
Ugh! Can I see the
borg create
invocation from the verbose output to make sure that the source directiories are getting de-duplicated correctly?Also, can you try
rm -fr /root/.borgmatic
, and then running borgmatic again to see if that makes the hang go away? If it does, then there's probably a stale named pipe sitting around. (Which would indicate borgmatic needs to do further cleanup before runningborg create
.)Note that I'm using the latest git version (as of yesterday) for this debugging.
While I'm here:
Note that if you use --file instead of --files, there is no error message warning about eg unknown option.
Also note that I added --single-transaction to mysql options, I think that would be good to have in the generated options file, it allows InnoDB databases to be backed up without locking (which also requires extra permissions for the backup user)
And in the config.yaml file, I was confused in the mysql area because the comments abov "# - name: users" was at the same tab level as the following comments. The "# Database name (requir..." comments should be a few spaces to the left, to make clear that the following options (like hostname:) are suboptions to the -name bit.
From the verbose log:
Note that I disabled all the other paths, except for /etc, so dedup of the paths should not enter into the equation at all.
This is how it ends:
Removing /root/.borgmatic worked! (I only had to rm -r, no -f)
It run until completion.
Before the delete and run:
After the run:
So the stale pipes idea was correct!
I tried to restore the database, but hit a couple of problems:
That's all the comments I have about that for now, sorry to hijack the thread (it was faster that way), feel free to break it up into a new thread if you want.
Great work on borgmatic!
I think that's because
--file
is accepted as a shorthand for--files
! Not sure why. Maybe it's a Python thing with plural options.What do you mean in the generated options file? Do you mean as a formal borgmatic configuration file option for the MySQL hook?
Good call. Fixed in master!
Great! I'll leave this ticket open to cover borgmatic more aggressively deleting stale pipes from previous runs before running any database dumps.
I think should be covered by the (not yet implemented) #309. In any case, I'll expand that ticket to cover this use case. Feel free to follow along there.
I filed this as #322.
That works too!
No worries. Better to get it all down than not at all.. I appreciate all the feedback.
Okay, I just pushed a change in which borgmatic removes the entire database dump path, including all contained dump named pipes, regardless of whether they're currently configured databases. That should prevent Borg from hanging on a stale named pipe that will never receive any data.
This is released in borgmatic 1.5.6. Please let me know if you find any other issues. And I could use some more feedback on #321 when you get a chance!