Multiple calls to healthchecks.io fail with "Max retries exceeded" #377

Closed
opened 2020-12-07 12:52:53 +00:00 by GeorgeGedox · 2 comments

What I'm trying to do and why

I've got 3 config files on a host backing up in 3 repositories and pinging 3 healthchecks.io endpoints via cron, the cronjob runs hourly and goes trough all configs.

Most of the time the hook runs correctly but after a while it'll start failing for one of the configs or even for all of them with a Max retries exceeded, my theory is due to a lack of delayed tries in the integration, it should try at least 5 times to ping the endpoints with increased delay so it avoids concurrent connections.

Steps to reproduce (if a bug)

Ping multiple healtchecks.io endpoints in a short amount of time

Actual behavior (if a bug)

[2020-12-07 13:07:28,602] INFO: Archive consistency check complete, no problems found.
[2020-12-07 13:07:28,603] INFO: /etc/borgmatic.d/nextcloud.yaml: Pinging Healthchecks finish
[2020-12-07 13:07:28,825] CRITICAL:
[2020-12-07 13:07:28,825] CRITICAL: summary:
[2020-12-07 13:00:16,977] CRITICAL: /etc/borgmatic.d/containers.yaml: Error running configuration file
[2020-12-07 13:00:11,815] CRITICAL: /etc/borgmatic.d/containers.yaml: Error running pre hook
[2020-12-07 13:00:11,815] CRITICAL: HTTPSConnectionPool(host='hc-ping.com', port=443): Max retries exceeded with url: /<endpoint>/start (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7b27b1c7f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

Expected behavior (if a bug)

Hook should retry a few times to ping an endpoint if it fails to avoid false-positives due to concurent connections

Other notes / implementation ideas

Integration should try at least 5 times to ping the endpoints with a delay of +1 second for each failed ping.

Environment

borgmatic version: 1.5.12

borgmatic installation method: pip

Borg version: 1.11.1

Python version: 3.8.5

operating system and version: Ubuntu 20.04

#### What I'm trying to do and why I've got 3 config files on a host backing up in 3 repositories and pinging 3 healthchecks.io endpoints via cron, the cronjob runs hourly and goes trough all configs. Most of the time the hook runs correctly but after a while it'll start failing for one of the configs or even for all of them with a `Max retries exceeded`, my theory is due to a lack of delayed tries in the integration, it should try at least 5 times to ping the endpoints with increased delay so it avoids concurrent connections. #### Steps to reproduce (if a bug) Ping multiple healtchecks.io endpoints in a short amount of time #### Actual behavior (if a bug) ``` [2020-12-07 13:07:28,602] INFO: Archive consistency check complete, no problems found. [2020-12-07 13:07:28,603] INFO: /etc/borgmatic.d/nextcloud.yaml: Pinging Healthchecks finish [2020-12-07 13:07:28,825] CRITICAL: [2020-12-07 13:07:28,825] CRITICAL: summary: [2020-12-07 13:00:16,977] CRITICAL: /etc/borgmatic.d/containers.yaml: Error running configuration file [2020-12-07 13:00:11,815] CRITICAL: /etc/borgmatic.d/containers.yaml: Error running pre hook [2020-12-07 13:00:11,815] CRITICAL: HTTPSConnectionPool(host='hc-ping.com', port=443): Max retries exceeded with url: /<endpoint>/start (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7b27b1c7f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')) ``` #### Expected behavior (if a bug) Hook should retry a few times to ping an endpoint if it fails to avoid false-positives due to concurent connections #### Other notes / implementation ideas Integration should try at least 5 times to ping the endpoints with a delay of +1 second for each failed ping. #### Environment **borgmatic version:** 1.5.12 **borgmatic installation method:** pip **Borg version:** 1.11.1 **Python version:** 3.8.5 **operating system and version:** Ubuntu 20.04
GeorgeGedox changed title from Multiple calls to healtchecks fail with "Max retries exceeded" to Multiple calls to healthchecks.io fail with "Max retries exceeded" 2020-12-07 12:53:18 +00:00
Owner

I apologize for the lengthy delay here. I just implemented a related ticket (#439) that changes Healthchecks connection failures from errors to mere warnings. That won't do any retries, but it will make occasional Healthchecks connections errors less of a big deal—they'll no longer block any backups from happening. So do you think that's sufficient for your particular use case? Thanks!

I apologize for the lengthy delay here. I just implemented a related ticket (#439) that changes Healthchecks connection failures from errors to mere warnings. That won't do any retries, but it will make occasional Healthchecks connections errors less of a big deal—they'll no longer block any backups from happening. So do you think that's sufficient for your particular use case? Thanks!
witten added the
waiting for response
label 2022-05-24 23:18:35 +00:00
Owner

Closing this for now, but feel free to follow up. We can always reopen it. Thanks!

Closing this for now, but feel free to follow up. We can always reopen it. Thanks!
witten removed the
waiting for response
label 2022-10-05 06:12:13 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#377
No description provided.