[Question] Monitoring borgmatic check #255
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#255
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I am running borgmatic --check every month to check on my backups. The backups are large (> 1tb) so I run borgmatic --check and borgmatic --create at different intervals. Both commands are run using cron.
Question
How can see the results from
borgmatic --check
or be notified if the check fails. It does not seem to be supported by the hooks?Environment
borgmatic version: 1.3.26
Borg version: borg 1.1.10
Python version: Python 3.6.8
The existing
on_error
command hook will trigger whenborgmatic check
fails! Additionally, if you're using one of the monitoring services hooks (Healthchecks, Cronitor, Cronhub), then they will be notified if aborgmatic check
invocation fails. Or did you have another type of notification in mind?Also, perhaps the docs need clarification on this point?
Thanks!
I guess what happened was that
borgmatic check
was running when another process attempted aborgmatic create
. The last failed and fired of theon_error
hook.While debugging I couldn't see in the logs that the
borgmatic check
had started/finished so my conclusion was that it had somehow failed.After re-reading the documentation this section is important:
before_everything hooks collected from all borgmatic configuration files run once before all configuration files (prior to all actions), but only if there is a create action.
Which means that there is no way for me to see whether the
borgmatic check
started or not.It seems that the health checks follows the same logic. No notification is done if
create
is not included.Do you have any recommendation for monitoring the
borgmatic check
??As far as the multiple borgmatic processes running, you may be interested in this ticket: witten/borgmatic#250
In terms of monitoring of
borgmatic check
(and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks forborgmatic check
in addition toborgmatic create
. The downside though of that approach is that it might be confusing whether acreate
orcheck
is happening in any given hook. Thoughts on that?It's also possible that logging on
borgmatic check
start/end could be increased a bit.Ideally, how are you wanting to monitor
borgmatic check
specifically?Thanks, I subscribed.
One solution could be to separate the hooks for
create
andcheck
. In this way you could monitor with external health check services thatcreate
runs every night andcheck
runs every month. Thoughts?The new check added could be named
after_check
.That would help debugging for sure, but I believe that most users should also be monitoring that the
check
runs as scheduled.I would POST (using curl) to our monitoring service
NodePing
, so bothcheck
andcreate
is monitored.That could certain work. However, note that currently the
after_*
hooks only fire on success, andon_error
fires on failure/error. So we'd probably also have to separate outon_check_error
or similar, assuming that you want separate failure notifications forcreate
andcheck
.That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn't have to craft
curl
commands. It sounds like for use case, you'd need it to hit different endpoints for each ofcreate
andcheck
, and indicate successes/failures separately for each.Related issue: #249.
Sure, with Nodeping I would have to create 3 separate checks: Create, Check and Prune but I think its easier for Borgmatic to maintain
after_*
hooks for the above rather than trying to tightly integrate with all the different services. I believe crafting a curl command is easier for the community compared to maintaining X number of tight integrations with different providers.I believe my scenario would be solved by adding additional hooks for
after_*
and providing better logs onchecks
.Why not both? 😃 Both being: Three separate checks + a NodePing integration. I'm fine starting this ticket at the former however.
Okay, the following per-action hooks are implemented in master: "before_prune", "after_prune", "before_check", and "after_check". I'll post here when it's released. Thanks again for the idea!
Released in borgmatic 1.5.0!
Awesome. Looking forward to try it out