Database backups fail and cause stalling in borg #509
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#509
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Doing backups of a remote Postgres database (though other types also seem affected).
Steps to reproduce (if a bug)
Add database hook to working
config.yaml
:Run backup:
Actual behavior (if a bug)
The log hangs at this point and there doesn't seem to be any more progress.
Looking at the created database dump, the file is empty:
Running the dump manually works:
Running the dump manually without password, we get an error as expected and also an empty dump file:
Expected behavior (if a bug)
Borg shouldn't hang and stall the whole backup (that's not the root issue here and probably not an issue within borgmatic).
Database dump should work properly or at least catch the error.
Other notes / implementation ideas
Environment
borgmatic version: 1.5.23 (I tried 1.5.22 and 1.5.21 and got the same result)
borgmatic installation method: Docker (https://hub.docker.com/r/b3vis/borgmatic)
Borg version: 1.2.0
Python version: Python 3.8.10
Database version (if applicable): psql (PostgreSQL) 12.9 (Debian 12.9-1.pgdg110+1)
operating system and version: Docker on Ubuntu 20.04.4 LTS
Thanks for reporting this. A little context: borgmatic streams database backups to Borg via a named pipe. Specifically, that
postgres
file (notice the "p" at the start of the line), so it makes sense that it's empty. Consuming from a named pipe can sometimes cause hangs with Borg if the producing end of the pipe (in this case, Postgres) isn't filling it with data.I'm not sure what's going on here though. One thing you can try is to delete the entirety of the
postgresql_databases
directory before running borgmatic, in case there's an old named pipe leftover from a prior run. Generally, though, these should get deleted automatically.If that doesn't help, you could try adding the
--files
flag to your borgmatic invocation, with the idea that might show which file is causing the hang. What paths are in your source directories? Do any of them include named pipes or other special files?Thanks for the explanation. I admit I didn't come across named pipes before and I totally missed the
p
and so I didn't notice that this isn't a regular file.I tried deleting the leftover pipes after an aborted/hanging run, but the same issue occured again.
Thanks a lot for this hint! Using
--files
I can see that it hangs while processing a file in one of the other source directories. So this is completely unrelated to the database backups. I was misled by not noticing the usage of named pipes.I recently added the volume of a Postfix MTA container (mailcow-dockerized) and this seems to be causing the lockup:
All other containers using this volume are stopped before the backup, so there shouldn't be any parallel access to the file. But still
master.pid
seems to be problematic for some reason although it's just a regular file. A pid-file in a backup doesn't really make sense anyway, but the official mailcow backup script includes this volume, too.It's possible Borg only prints the name of a file after it's done being backed up. In which case the actual file causing the hang might be another file that's not listed. You could try
sudo find /your/source/path -type b,c,p,s
which should give you the paths to any special files that could be causing this hang. I don't imagine that a regular pid file, for instance, causes any problems here.It actually is related! When you configure a database with borgmatic, that turns on Borg's
--read-special
flag, which instructs Borg to try to read from special files (like the named pipes used to stream database backups).Edit: Sorry, didn't yet read your previous reply.
Ok, this is still very strange. I can reliably solve the issue when I remove all the database hooks from the
config.yaml
even though the issue seems to be elsewhere.@witten Can you come up with an explanation or further debugging hints for this behavior?
And thanks for your prompt help by the way!
Thanks, will try.
Ok, once again thanks for explaining. Really learning a lot today! 😉
It turns out there are a bunch of sockets and pipes contained in the mailcow volumes that cause this. I will close this issue now that it is clear how it happened. I guess I will have to look at how to exclude these files.
Still the "silent" failure is somewhat frustrating. It would be nice to have some way for borgmatic or borg to catch and/or ignore these kind of issues instead of breaking the whole backup. But I also see this might not be easily implemented.
Yeah, there's an open Borg issue on this. I'll see if I can mention this "silent" failure more prominently in the borgmatic documentation though.
In terms of excludes, you should be able to just add to the exclude patterns in borgmatic's configuration file. Might be obnoxious to exclude those independently if there are other things in that directory that need to be backed up...
Alternatively, you could create separate borgmatic configuration files—one for database backups and one for backing up all your other source files. That would allow you to backup these Mailcow socket files without worrying about the database settings enabling
--read-special
.@witten Thanks for the pointers. Especially the latter sounds like a nice and easy workaround.