Borg gets SIGKILL'd mid-extraction #267

Closed
opened 2019-12-04 02:27:17 +00:00 by BirkhoffLee · 6 comments

What I'm trying to do and why

Extract a prior backup from a remote repo using borgmatic.

Steps to reproduce (if a bug)

Not sure if it's a bug.

Actual behavior (if a bug)

$ ~/.local/bin/borgmatic extract -c ./config.yaml --archive ARCHIVE  --path data/25db0351-0ff0-4be0-b046-0b9fdd71d637 data/df8a3725-6b73-4ea3-86ec-0e7977f9a458 --progress -v 2
Ensuring legacy configuration is upgraded
ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo: Extracting archive ARCHIVE
borg extract --remote-path borg1 --debug --list --show-rc --progress ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE data/25db0351-0ff0-4be0-b046-0b9fdd71d637 data/df8a3725-6b73-4ea3-86ec-0e7977f9a458
using builtin fallback logging configuration
35 self tests completed in 0.07 seconds
SSH command line: ['ssh', 'USERNAME@SERVER', 'borg1', 'serve', '--umask=077', '--debug']
Remote: using builtin fallback logging configuration
Remote: using builtin fallback logging configuration
Remote: Initialized logging system for JSON-based protocol
Remote: Resolving repository path b'/~/server/ovh/backups/borg-repo'
Remote: Resolved repository path to '/data3/home/USERNAME/server/ovh/backups/borg-repo'
Remote: Verified integrity of /data3/home/USERNAME/server/ovh/backups/borg-repo/index.481
TAM-verified manifest
security: read previous location 'ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo'
security: read manifest timestamp '2019-12-03T03:23:40.983689'
security: determined newest manifest timestamp as 2019-12-03T03:23:40.983689
security: repository checks ok, allowing access
ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo: Error running actions for repository
Command '('borg', 'extract', '--remote-path', 'borg1', '--debug', '--list', '--show-rc', '--progress', 'ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE', 'data/25db0351-0ff0-4be0-b046-0b9fdd71d637', 'data/df8a3725-6b73-4ea3-86ec-0e7977f9a458')' died with <Signals.SIGKILL: 9>.
./config.yaml: Running command for on-error hook
echo "Error while creating a backup or running a backup hook."
Error while creating a backup or running a backup hook.
./config.yaml: Calling healthchecks hook function ping_monitor
./config.yaml: Pinging Healthchecks fail
./config.yaml: Using Healthchecks ping URL https://hc-ping.com/xxx/fail
./config.yaml: Error running configuration file

summary:
./config.yaml: Error running configuration file
ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo: Error running actions for repository
Command '('borg', 'extract', '--remote-path', 'borg1', '--debug', '--list', '--show-rc', '--progress', 'ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE', 'data/25db0351-0ff0-4be0-b046-0b9fdd71d637', 'data/df8a3725-6b73-4ea3-86ec-0e7977f9a458')' died with <Signals.SIGKILL: 9>.

Need some help? https://torsion.org/borgmatic/#issues

Environment

borgmatic version: 1.4.16

borgmatic installation method: pip

Borg version: 1.1.10

Python version: 3.6.8

operating system and version: CentOS 8 / Ubuntu 18.04.3 (yes I tested on 2 different machines in different regions and same)

#### What I'm trying to do and why Extract a prior backup from a remote repo using borgmatic. #### Steps to reproduce (if a bug) Not sure if it's a bug. #### Actual behavior (if a bug) ``` $ ~/.local/bin/borgmatic extract -c ./config.yaml --archive ARCHIVE --path data/25db0351-0ff0-4be0-b046-0b9fdd71d637 data/df8a3725-6b73-4ea3-86ec-0e7977f9a458 --progress -v 2 Ensuring legacy configuration is upgraded ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo: Extracting archive ARCHIVE borg extract --remote-path borg1 --debug --list --show-rc --progress ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE data/25db0351-0ff0-4be0-b046-0b9fdd71d637 data/df8a3725-6b73-4ea3-86ec-0e7977f9a458 using builtin fallback logging configuration 35 self tests completed in 0.07 seconds SSH command line: ['ssh', 'USERNAME@SERVER', 'borg1', 'serve', '--umask=077', '--debug'] Remote: using builtin fallback logging configuration Remote: using builtin fallback logging configuration Remote: Initialized logging system for JSON-based protocol Remote: Resolving repository path b'/~/server/ovh/backups/borg-repo' Remote: Resolved repository path to '/data3/home/USERNAME/server/ovh/backups/borg-repo' Remote: Verified integrity of /data3/home/USERNAME/server/ovh/backups/borg-repo/index.481 TAM-verified manifest security: read previous location 'ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo' security: read manifest timestamp '2019-12-03T03:23:40.983689' security: determined newest manifest timestamp as 2019-12-03T03:23:40.983689 security: repository checks ok, allowing access ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo: Error running actions for repository Command '('borg', 'extract', '--remote-path', 'borg1', '--debug', '--list', '--show-rc', '--progress', 'ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE', 'data/25db0351-0ff0-4be0-b046-0b9fdd71d637', 'data/df8a3725-6b73-4ea3-86ec-0e7977f9a458')' died with <Signals.SIGKILL: 9>. ./config.yaml: Running command for on-error hook echo "Error while creating a backup or running a backup hook." Error while creating a backup or running a backup hook. ./config.yaml: Calling healthchecks hook function ping_monitor ./config.yaml: Pinging Healthchecks fail ./config.yaml: Using Healthchecks ping URL https://hc-ping.com/xxx/fail ./config.yaml: Error running configuration file summary: ./config.yaml: Error running configuration file ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo: Error running actions for repository Command '('borg', 'extract', '--remote-path', 'borg1', '--debug', '--list', '--show-rc', '--progress', 'ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE', 'data/25db0351-0ff0-4be0-b046-0b9fdd71d637', 'data/df8a3725-6b73-4ea3-86ec-0e7977f9a458')' died with <Signals.SIGKILL: 9>. Need some help? https://torsion.org/borgmatic/#issues ``` #### Environment **borgmatic version:** 1.4.16 **borgmatic installation method:** pip **Borg version:** 1.1.10 **Python version:** 3.6.8 **operating system and version:** CentOS 8 / Ubuntu 18.04.3 (yes I tested on 2 different machines in different regions and same)
Owner

Yikes! From these logs, I'm not sure why Borg would get SIGKILL'd during extract. The first thing that pops to mind though is that perhaps you're running out of memory on the client* during extract, and the kernel OOM killer is issuing a SIGKILL to Borg. Can you try looking in your system logs to see if there's any indication of the OOM killer taking Borg out? And how's memory usage look during extract?

I can't think of anything in borgmatic specifically that would cause that. Do other borgmatic/Borg actions like create appear to work okay, and is extract the only one experiencing a problem?

* I'm assuming this is on the client, rather than the server, based on the log output above. But maybe it's worth checking out the memory / OOM situation on the server too just in case I'm wrong about that.

Yikes! From these logs, I'm not sure why Borg would get SIGKILL'd during `extract`. The first thing that pops to mind though is that perhaps you're running out of memory on the client* during `extract`, and the kernel OOM killer is issuing a SIGKILL to Borg. Can you try looking in your system logs to see if there's any indication of the OOM killer taking Borg out? And how's memory usage look during `extract`? I can't think of anything in borgmatic specifically that would cause that. Do other borgmatic/Borg actions like `create` appear to work okay, and is `extract` the only one experiencing a problem? \* I'm assuming this is on the client, rather than the server, based on the log output above. But maybe it's worth checking out the memory / OOM situation on the server too just in case I'm wrong about that.
Author

@witten Yes you are right! Just checked the logs and kernel is killing borg. It seems like only extract is encountering OOM. I'm running another extract and I will reply to you for the memory usage.

What do you think that will be a solution for this issue?

@witten Yes you are right! Just checked the logs and kernel is killing borg. It seems like only `extract` is encountering OOM. I'm running another extract and I will reply to you for the memory usage. What do you think that will be a solution for this issue?
Author

Oh by the way, the server's got 30 GB of RAM.

Oh by the way, the server's got 30 GB of RAM.
Owner

Is the kernel killing Borg on the client or the server? If on the client, how much free memory is available there at the time borg extract runs? If on the server, how much free memory is available there? You can try free -h or similar.

One other thing you can do is to try to run the Borg command without borgmatic to see if that exhibits similar behavior. Based on the above, it looks like that'd be:

borg extract --remote-path borg1 --debug --list --show-rc --progress ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE data/25db0351-0ff0-4be0-b046-0b9fdd71d637 data/df8a3725-6b73-4ea3-86ec-0e7977f9a458

If this turns out to be a likely Borg-specific issue, then you may have to file a ticket with the Borg project for help. But first I'd recommend seeing how much memory you have available in case you can free some up to make this work.

Is the kernel killing Borg on the client or the server? If on the client, how much free memory is available there at the time `borg extract` runs? If on the server, how much free memory is available there? You can try `free -h` or similar. One other thing you can do is to try to run the Borg command without borgmatic to see if that exhibits similar behavior. Based on the above, it looks like that'd be: ``` borg extract --remote-path borg1 --debug --list --show-rc --progress ssh://USERNAME@SERVER/~/server/ovh/backups/borg-repo::ARCHIVE data/25db0351-0ff0-4be0-b046-0b9fdd71d637 data/df8a3725-6b73-4ea3-86ec-0e7977f9a458 ``` If this turns out to be a likely Borg-specific issue, then you may have to file a ticket with the Borg project for help. But first I'd recommend seeing how much memory you have available in case you can free some up to make this work.
Author

Thanks for the help!

I've just read a bit of Borg's code here and seems --progress is the culprit (the calculation is very slow and takes a lot of memory). Even this is not related to Borgmatic, you offered many help, thank you a lot! Have a nice day :)

Thanks for the help! I've just read a bit of Borg's code [here](https://github.com/borgbackup/borg/blob/master/src/borg/archiver.py#L779) and seems `--progress` is the culprit (the calculation is very slow and takes a lot of memory). Even this is not related to Borgmatic, you offered many help, thank you a lot! Have a nice day :)
Owner

I'm glad to hear that you've found the likely cause and have a solution for it!

I'm glad to hear that you've found the likely cause and have a solution for it!
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#267
No description provided.