[Question] Monitoring borgmatic check #255
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
I am running borgmatic --check every month to check on my backups. The backups are large (> 1tb) so I run borgmatic --check and borgmatic --create at different intervals. Both commands are run using cron.
Question
How can see the results from
borgmatic --checkor be notified if the check fails. It does not seem to be supported by the hooks?Environment
borgmatic version: 1.3.26
Borg version: borg 1.1.10
Python version: Python 3.6.8
The existing
on_errorcommand hook will trigger whenborgmatic checkfails! Additionally, if you're using one of the monitoring services hooks (Healthchecks, Cronitor, Cronhub), then they will be notified if aborgmatic checkinvocation fails. Or did you have another type of notification in mind?Also, perhaps the docs need clarification on this point?
Thanks!
I guess what happened was that
borgmatic checkwas running when another process attempted aborgmatic create. The last failed and fired of theon_errorhook.While debugging I couldn't see in the logs that the
borgmatic checkhad started/finished so my conclusion was that it had somehow failed.After re-reading the documentation this section is important:
before_everything hooks collected from all borgmatic configuration files run once before all configuration files (prior to all actions), but only if there is a create action.Which means that there is no way for me to see whether the
borgmatic checkstarted or not.It seems that the health checks follows the same logic. No notification is done if
createis not included.Do you have any recommendation for monitoring the
borgmatic check??As far as the multiple borgmatic processes running, you may be interested in this ticket: witten/borgmatic#250
In terms of monitoring of
borgmatic check(and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks forborgmatic checkin addition toborgmatic create. The downside though of that approach is that it might be confusing whether acreateorcheckis happening in any given hook. Thoughts on that?It's also possible that logging on
borgmatic checkstart/end could be increased a bit.Ideally, how are you wanting to monitor
borgmatic checkspecifically?Thanks, I subscribed.
One solution could be to separate the hooks for
createandcheck. In this way you could monitor with external health check services thatcreateruns every night andcheckruns every month. Thoughts?The new check added could be named
after_check.That would help debugging for sure, but I believe that most users should also be monitoring that the
checkruns as scheduled.I would POST (using curl) to our monitoring service
NodePing, so bothcheckandcreateis monitored.That could certain work. However, note that currently the
after_*hooks only fire on success, andon_errorfires on failure/error. So we'd probably also have to separate outon_check_erroror similar, assuming that you want separate failure notifications forcreateandcheck.That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn't have to craft
curlcommands. It sounds like for use case, you'd need it to hit different endpoints for each ofcreateandcheck, and indicate successes/failures separately for each.Related issue: #249.
Sure, with Nodeping I would have to create 3 separate checks: Create, Check and Prune but I think its easier for Borgmatic to maintain
after_*hooks for the above rather than trying to tightly integrate with all the different services. I believe crafting a curl command is easier for the community compared to maintaining X number of tight integrations with different providers.I believe my scenario would be solved by adding additional hooks for
after_*and providing better logs onchecks.Why not both? 😃 Both being: Three separate checks + a NodePing integration. I'm fine starting this ticket at the former however.
Okay, the following per-action hooks are implemented in master: "before_prune", "after_prune", "before_check", and "after_check". I'll post here when it's released. Thanks again for the idea!
Released in borgmatic 1.5.0!
Awesome. Looking forward to try it out