[Question] Monitoring borgmatic check #255

Closed
opened 2019-11-27 17:18:29 +00:00 by mhaamann · 11 comments

What I'm trying to do and why

I am running borgmatic --check every month to check on my backups. The backups are large (> 1tb) so I run borgmatic --check and borgmatic --create at different intervals. Both commands are run using cron.

Question

How can see the results from borgmatic --check or be notified if the check fails. It does not seem to be supported by the hooks?

Environment

borgmatic version: 1.3.26

Borg version: borg 1.1.10

Python version: Python 3.6.8

#### What I'm trying to do and why I am running borgmatic --check every month to check on my backups. The backups are large (> 1tb) so I run borgmatic --check and borgmatic --create at different intervals. Both commands are run using cron. #### Question How can see the results from `borgmatic --check` or be notified if the check fails. It does not seem to be supported by the hooks? #### Environment **borgmatic version:** 1.3.26 **Borg version:** borg 1.1.10 **Python version:** Python 3.6.8
witten added the
question / support
label 2019-11-27 17:50:20 +00:00
Owner

The existing on_error command hook will trigger when borgmatic check fails! Additionally, if you're using one of the monitoring services hooks (Healthchecks, Cronitor, Cronhub), then they will be notified if a borgmatic check invocation fails. Or did you have another type of notification in mind?

Also, perhaps the docs need clarification on this point?

The existing `on_error` command hook will trigger when `borgmatic check` fails! Additionally, if you're using one of the monitoring services hooks ([Healthchecks](https://torsion.org/borgmatic/docs/how-to/monitor-your-backups/#healthchecks-hook), Cronitor, Cronhub), then they will be notified if a `borgmatic check` invocation fails. Or did you have another type of notification in mind? Also, perhaps the docs need clarification on this point?
Author

Thanks!
I guess what happened was that borgmatic check was running when another process attempted a borgmatic create. The last failed and fired of the on_error hook.

While debugging I couldn't see in the logs that the borgmatic check had started/finished so my conclusion was that it had somehow failed.

After re-reading the documentation this section is important:

before_everything hooks collected from all borgmatic configuration files run once before all configuration files (prior to all actions), but only if there is a create action.

Which means that there is no way for me to see whether the borgmatic check started or not.
It seems that the health checks follows the same logic. No notification is done if create is not included.

Do you have any recommendation for monitoring the borgmatic check??

Thanks! I guess what happened was that `borgmatic check` was running when another process attempted a `borgmatic create`. The last failed and fired of the `on_error ` hook. While debugging I couldn't see in the logs that the `borgmatic check` had started/finished so my conclusion was that it had somehow failed. After re-reading the documentation this section is important: ```before_everything hooks collected from all borgmatic configuration files run once before all configuration files (prior to all actions), but only if there is a create action.``` Which means that there is no way for me to see whether the `borgmatic check` started or not. It seems that the health checks follows the same logic. No notification is done if `create` is not included. Do you have any recommendation for monitoring the `borgmatic check`??
Owner

As far as the multiple borgmatic processes running, you may be interested in this ticket: witten/borgmatic#250

In terms of monitoring of borgmatic check (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for borgmatic check in addition to borgmatic create. The downside though of that approach is that it might be confusing whether a create or check is happening in any given hook. Thoughts on that?

It's also possible that logging on borgmatic check start/end could be increased a bit.

Ideally, how are you wanting to monitor borgmatic check specifically?

As far as the multiple borgmatic processes running, you may be interested in this ticket: https://projects.torsion.org/witten/borgmatic/issues/250 In terms of monitoring of `borgmatic check` (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for `borgmatic check` in addition to `borgmatic create`. The downside though of that approach is that it might be confusing whether a `create` or `check` is happening in any given hook. Thoughts on that? It's also possible that logging on `borgmatic check` start/end could be increased a bit. Ideally, how are you wanting to monitor `borgmatic check` specifically?
Author

As far as the multiple borgmatic processes running, you may be interested in this ticket: #250

Thanks, I subscribed.

In terms of monitoring of borgmatic check (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for borgmatic check in addition to borgmatic create. The downside though of that approach is that it might be confusing whether a create or check is happening in any given hook. Thoughts on that?

One solution could be to separate the hooks for create and check. In this way you could monitor with external health check services that create runs every night and check runs every month. Thoughts?

The new check added could be named after_check.

It’s also possible that logging on borgmatic check start/end could be increased a bit.

That would help debugging for sure, but I believe that most users should also be monitoring that the check runs as scheduled.

Ideally, how are you wanting to monitor borgmatic check specifically?

I would POST (using curl) to our monitoring service NodePing, so both check and create is monitored.

>As far as the multiple borgmatic processes running, you may be interested in this ticket: #250 Thanks, I subscribed. >In terms of monitoring of borgmatic check (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for borgmatic check in addition to borgmatic create. The downside though of that approach is that it might be confusing whether a create or check is happening in any given hook. Thoughts on that? One solution could be to separate the hooks for `create` and `check`. In this way you could monitor with external health check services that `create` runs every night and `check` runs every month. Thoughts? The new check added could be named `after_check`. >It’s also possible that logging on borgmatic check start/end could be increased a bit. That would help debugging for sure, but I believe that most users should also be monitoring that the `check` runs as scheduled. >Ideally, how are you wanting to monitor borgmatic check specifically? I would POST (using curl) to our monitoring service `NodePing`, so both `check` and `create` is monitored.
Owner

One solution could be to separate the hooks for create and check. In this way you could monitor with external health check services that create runs every night and check runs every month. Thoughts?

That could certain work. However, note that currently the after_* hooks only fire on success, and on_error fires on failure/error. So we'd probably also have to separate out on_check_error or similar, assuming that you want separate failure notifications for create and check.

I would POST (using curl) to our monitoring service NodePing, so both check and create is monitored.

That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn't have to craft curl commands. It sounds like for use case, you'd need it to hit different endpoints for each of create and check, and indicate successes/failures separately for each.

> One solution could be to separate the hooks for create and check. In this way you could monitor with external health check services that create runs every night and check runs every month. Thoughts? That could certain work. However, note that currently the `after_*` hooks only fire on success, and `on_error` fires on failure/error. So we'd probably also have to separate out `on_check_error` or similar, assuming that you want separate failure notifications for `create` and `check`. > I would POST (using curl) to our monitoring service NodePing, so both check and create is monitored. That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn't have to craft `curl` commands. It sounds like for use case, you'd need it to hit different endpoints for each of `create` and `check`, and indicate successes/failures separately for each.
Owner

Related issue: #249.

Related issue: #249.
Author

That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn’t have to craft curl commands. It sounds like for use case, you’d need it to hit different endpoints for each of create and check, and indicate successes/failures separately for each.

Sure, with Nodeping I would have to create 3 separate checks: Create, Check and Prune but I think its easier for Borgmatic to maintain after_* hooks for the above rather than trying to tightly integrate with all the different services. I believe crafting a curl command is easier for the community compared to maintaining X number of tight integrations with different providers.

I believe my scenario would be solved by adding additional hooks for after_* and providing better logs on checks.

>That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn’t have to craft curl commands. It sounds like for use case, you’d need it to hit different endpoints for each of create and check, and indicate successes/failures separately for each. Sure, with Nodeping I would have to create 3 separate checks: Create, Check and Prune but I think its easier for Borgmatic to maintain `after_*` hooks for the above rather than trying to tightly integrate with all the different services. I believe crafting a curl command is easier for the community compared to maintaining X number of tight integrations with different providers. I believe my scenario would be solved by adding additional hooks for `after_*` and providing better logs on `checks`.
Owner

Why not both? 😃 Both being: Three separate checks + a NodePing integration. I'm fine starting this ticket at the former however.

Why not both? :smiley: Both being: Three separate checks + a NodePing integration. I'm fine starting this ticket at the former however.
witten added this to the per-action hooks milestone 2019-12-05 21:16:16 +00:00
Owner

Okay, the following per-action hooks are implemented in master: "before_prune", "after_prune", "before_check", and "after_check". I'll post here when it's released. Thanks again for the idea!

Okay, the following per-action hooks are implemented in master: "before_prune", "after_prune", "before_check", and "after_check". I'll post here when it's released. Thanks again for the idea!
Owner

Released in borgmatic 1.5.0!

Released in borgmatic 1.5.0!
Author

Awesome. Looking forward to try it out

Awesome. Looking forward to try it out
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#255
No description provided.