Multiple calls to healthchecks.io fail with "Max retries exceeded" #377
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#377
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I've got 3 config files on a host backing up in 3 repositories and pinging 3 healthchecks.io endpoints via cron, the cronjob runs hourly and goes trough all configs.
Most of the time the hook runs correctly but after a while it'll start failing for one of the configs or even for all of them with a
Max retries exceeded
, my theory is due to a lack of delayed tries in the integration, it should try at least 5 times to ping the endpoints with increased delay so it avoids concurrent connections.Steps to reproduce (if a bug)
Ping multiple healtchecks.io endpoints in a short amount of time
Actual behavior (if a bug)
Expected behavior (if a bug)
Hook should retry a few times to ping an endpoint if it fails to avoid false-positives due to concurent connections
Other notes / implementation ideas
Integration should try at least 5 times to ping the endpoints with a delay of +1 second for each failed ping.
Environment
borgmatic version: 1.5.12
borgmatic installation method: pip
Borg version: 1.11.1
Python version: 3.8.5
operating system and version: Ubuntu 20.04
Multiple calls to healtchecks fail with "Max retries exceeded"to Multiple calls to healthchecks.io fail with "Max retries exceeded"I apologize for the lengthy delay here. I just implemented a related ticket (#439) that changes Healthchecks connection failures from errors to mere warnings. That won't do any retries, but it will make occasional Healthchecks connections errors less of a big deal—they'll no longer block any backups from happening. So do you think that's sufficient for your particular use case? Thanks!
Closing this for now, but feel free to follow up. We can always reopen it. Thanks!