borgmatic hangs on postgres restore #430
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#430
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Hello,
I'm trying to restore PostgreSQL database dumps made with borgmatic:
This is an extract of the configuration file:
I have removed:
~/.borgmatic
~/.cache/borgmatic
borg init -e repokey myrepo.borg
)just in case
Actual behavior (if a bug)
Borgmatic just hangs
output with
-v 2
:traceback after CTRL+C:
output with
strace
: this repeats continuously.more strace
output of
ps aux | grep borg
:Other notes / implementation ideas
it works by restoring manually with
pg_restore
, after running the extract command of borgmatic:Environment
borgmatic version: 1.5.15
borgmatic installation method: pip3
Borg version: 1.1.15
Python version: 3.7.3
Database version (if applicable):
psql (PostgreSQL) 13.3 (Debian 13.3-1.pgdg100+1)
operating system and version: Debian GNU/Linux 10
Thank you for providing the detailed diagnostic information. Looking at the logs and your configuration, I'm not seeing any obvious problems.
I see that you already tried a manual
pg_restore
, but one other thing you can try is a direct pipe to simulate what borgmatic is doing—but cutting borgmatic out of the picture:If that works, then it's pretty likely a problem with the way borgmatic is executing processes. If it doesn't work, then it's a problem with Borg or
pg_restore
(or the flags being provided to them). Personally, my money is on a borgmatic issue. But probably good to check.Ok, tried your command:
but I guess it's normal since we are using a pipe. Anyway it gets to the end and everything seems to be working.
This time I tried with borg 1.1.9 since I am on another machine.
I've tried to reproduce this with the exact same version of Debian Postgres, similar borgmatic options, and Borg 1.1.16. No reproduction here. The main difference I have on this machine is Python 3.9.5 instead of 3.7.3.
Are you by chance using a test database rather anything with proprietary data? If so, would it be feasible to include a database dump or even the Borg repository on this ticket? That might allow me to reproduce the problem if it's an issue triggered by the database rather than the environment.
Thanks for your patience.
It's a test database based on this Django app that uses postgres. Both are running in Docker:
Just tried now: the same issue happens with django-futils even on a newly initialized database.
Just a few notes here:
postgis/postgis:13-3.1
I installedpostgres-client-13
, which provides pgdump, etc... used by borgmatic, from the postgres apt repository (https://www.postgresql.org/download/linux/debian/)Maybe you can try installing debian stable on a vm and try everything from there
While I was able to set up a VM with Debian 10, Postgis running in Docker, and the exact versions of borgmatic and borgbackup, I was not able to get django-futils initialized. Would it be possible to provide me with either a database dump of django-futils or a VM image where this problem occurs? Alternatively, a working Docker image of django-futils might help me run it so that I can reproduce the problem. Thanks.
Thank you for your time. Unfortunately i cannot upload "big" stuff because of slow internet (it would take a day to upload just the docker image). Anyway, I attached a database dump,
postgres_dev.tar.gz
, obtained fromborgmatic extract --archive ...
There is also the borg repository if you need it (password is
password
)Finally got a repro! Thanks for all your help here. Now, to figure out why this hang is occurring..
Great 👍
Spent some more time digging into this. As far as I can tell,
pg_restore
is exiting "early" for some reason—beforeborg extract
expects it to. That's why you're seeingBroken Pipe
when running them directly with a shell pipe. When borgmatic is added to the picture, it appears that it's not passing on that unexpectedpg_restore
exit to Borg, hence the hang. I could change the borgmatic code to forcibly close the pipe whenpg_restore
exits, but that'd just result in aBroken Pipe
from Borg when doing aborgmatic extract
.So I think in order to solve this, it may be necessary to determine the cause of
pg_restore
exiting before Borg expects it to.I can't find anything on the borg and postgres issue trackers. Maybe trying with different postgres versions might pinpoint the problem.
Okay, turns out this is an indeed an unfortunate interaction between Borg and
pg_restore
that borgmatic isn't currently doing anything to make better. I popped into#postgresql
on Libera Chat IRC and got some help fromRhodiumToad
andilmari
. Apparently there are two types of restore data in a Postgres tar dump: A bunch of binary data files, and arestore.sql
file at the end thatpg_restore
ignores but is there in case you want to restore manually / via some other mechanism.The problem arises in that
pg_restore
unceremoneously closes the pipe as soon as it receives the binary restore data it needs—before the ignoredrestore.sql
is fully streamed. This makes sense frompg_restore
's perspective, but it means that any process streaming topg_restore
sees a broken pipe before the full dump streams across the pipe. Resulting in a hang in borgmatic (until there's a fix), or a mere error when piping directly fromborg extract
.Now that I think I know what's going on, I'll see if I can come up with some sort of work-around in borgmatic.
Side note: You probably want to avoid the
tar
dump format if you can, as it's apparently pretty inefficient and uses extra temporary space on disk during the dumping. And it has this unfortunate interaction, as well. So thecustom
format is better, ordirectory
if you need parallelism.Okay, a fix is now in place and released in borgmatic 1.5.16. Let me know how that works for you!
Thanks!
It works now
Whew! Thanks again for your patience here.