#255 [Question] Monitoring borgmatic check

Closed
opened 2 months ago by mhaamann · 11 comments

What I'm trying to do and why

I am running borgmatic --check every month to check on my backups. The backups are large (> 1tb) so I run borgmatic --check and borgmatic --create at different intervals. Both commands are run using cron.

Question

How can see the results from borgmatic --check or be notified if the check fails. It does not seem to be supported by the hooks?

Environment

borgmatic version: 1.3.26

Borg version: borg 1.1.10

Python version: Python 3.6.8

#### What I'm trying to do and why I am running borgmatic --check every month to check on my backups. The backups are large (> 1tb) so I run borgmatic --check and borgmatic --create at different intervals. Both commands are run using cron. #### Question How can see the results from `borgmatic --check` or be notified if the check fails. It does not seem to be supported by the hooks? #### Environment **borgmatic version:** 1.3.26 **Borg version:** borg 1.1.10 **Python version:** Python 3.6.8
witten added the
question / support
label 2 months ago
witten commented 2 months ago
Owner

The existing on_error command hook will trigger when borgmatic check fails! Additionally, if you're using one of the monitoring services hooks (Healthchecks, Cronitor, Cronhub), then they will be notified if a borgmatic check invocation fails. Or did you have another type of notification in mind?

Also, perhaps the docs need clarification on this point?

The existing `on_error` command hook will trigger when `borgmatic check` fails! Additionally, if you're using one of the monitoring services hooks ([Healthchecks](https://torsion.org/borgmatic/docs/how-to/monitor-your-backups/#healthchecks-hook), Cronitor, Cronhub), then they will be notified if a `borgmatic check` invocation fails. Or did you have another type of notification in mind? Also, perhaps the docs need clarification on this point?
mhaamann commented 2 months ago
Poster

Thanks! I guess what happened was that borgmatic check was running when another process attempted a borgmatic create. The last failed and fired of the on_error hook.

While debugging I couldn't see in the logs that the borgmatic check had started/finished so my conclusion was that it had somehow failed.

After re-reading the documentation this section is important:

before_everything hooks collected from all borgmatic configuration files run once before all configuration files (prior to all actions), but only if there is a create action.

Which means that there is no way for me to see whether the borgmatic check started or not. It seems that the health checks follows the same logic. No notification is done if create is not included.

Do you have any recommendation for monitoring the borgmatic check??

Thanks! I guess what happened was that `borgmatic check` was running when another process attempted a `borgmatic create`. The last failed and fired of the `on_error ` hook. While debugging I couldn't see in the logs that the `borgmatic check` had started/finished so my conclusion was that it had somehow failed. After re-reading the documentation this section is important: ```before_everything hooks collected from all borgmatic configuration files run once before all configuration files (prior to all actions), but only if there is a create action.``` Which means that there is no way for me to see whether the `borgmatic check` started or not. It seems that the health checks follows the same logic. No notification is done if `create` is not included. Do you have any recommendation for monitoring the `borgmatic check`??
witten commented 2 months ago
Owner

As far as the multiple borgmatic processes running, you may be interested in this ticket: #250

In terms of monitoring of borgmatic check (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for borgmatic check in addition to borgmatic create. The downside though of that approach is that it might be confusing whether a create or check is happening in any given hook. Thoughts on that?

It's also possible that logging on borgmatic check start/end could be increased a bit.

Ideally, how are you wanting to monitor borgmatic check specifically?

As far as the multiple borgmatic processes running, you may be interested in this ticket: https://projects.torsion.org/witten/borgmatic/issues/250 In terms of monitoring of `borgmatic check` (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for `borgmatic check` in addition to `borgmatic create`. The downside though of that approach is that it might be confusing whether a `create` or `check` is happening in any given hook. Thoughts on that? It's also possible that logging on `borgmatic check` start/end could be increased a bit. Ideally, how are you wanting to monitor `borgmatic check` specifically?
mhaamann commented 2 months ago
Poster

As far as the multiple borgmatic processes running, you may be interested in this ticket: #250

Thanks, I subscribed.

In terms of monitoring of borgmatic check (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for borgmatic check in addition to borgmatic create. The downside though of that approach is that it might be confusing whether a create or check is happening in any given hook. Thoughts on that?

One solution could be to separate the hooks for create and check. In this way you could monitor with external health check services that create runs every night and check runs every month. Thoughts?

The new check added could be named after_check.

It’s also possible that logging on borgmatic check start/end could be increased a bit.

That would help debugging for sure, but I believe that most users should also be monitoring that the check runs as scheduled.

Ideally, how are you wanting to monitor borgmatic check specifically?

I would POST (using curl) to our monitoring service NodePing, so both check and create is monitored.

>As far as the multiple borgmatic processes running, you may be interested in this ticket: #250 Thanks, I subscribed. >In terms of monitoring of borgmatic check (and specifically starting/ending), one option would be to update the existing logic to fire start/end hooks for borgmatic check in addition to borgmatic create. The downside though of that approach is that it might be confusing whether a create or check is happening in any given hook. Thoughts on that? One solution could be to separate the hooks for `create` and `check`. In this way you could monitor with external health check services that `create` runs every night and `check` runs every month. Thoughts? The new check added could be named `after_check`. >It’s also possible that logging on borgmatic check start/end could be increased a bit. That would help debugging for sure, but I believe that most users should also be monitoring that the `check` runs as scheduled. >Ideally, how are you wanting to monitor borgmatic check specifically? I would POST (using curl) to our monitoring service `NodePing`, so both `check` and `create` is monitored.
witten commented 2 months ago
Owner

One solution could be to separate the hooks for create and check. In this way you could monitor with external health check services that create runs every night and check runs every month. Thoughts?

That could certain work. However, note that currently the after_* hooks only fire on success, and on_error fires on failure/error. So we'd probably also have to separate out on_check_error or similar, assuming that you want separate failure notifications for create and check.

I would POST (using curl) to our monitoring service NodePing, so both check and create is monitored.

That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn't have to craft curl commands. It sounds like for use case, you'd need it to hit different endpoints for each of create and check, and indicate successes/failures separately for each.

> One solution could be to separate the hooks for create and check. In this way you could monitor with external health check services that create runs every night and check runs every month. Thoughts? That could certain work. However, note that currently the `after_*` hooks only fire on success, and `on_error` fires on failure/error. So we'd probably also have to separate out `on_check_error` or similar, assuming that you want separate failure notifications for `create` and `check`. > I would POST (using curl) to our monitoring service NodePing, so both check and create is monitored. That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn't have to craft `curl` commands. It sounds like for use case, you'd need it to hit different endpoints for each of `create` and `check`, and indicate successes/failures separately for each.
witten commented 2 months ago
Owner

Related issue: #249.

Related issue: #249.
mhaamann commented 2 months ago
Poster

That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn’t have to craft curl commands. It sounds like for use case, you’d need it to hit different endpoints for each of create and check, and indicate successes/failures separately for each.

Sure, with Nodeping I would have to create 3 separate checks: Create, Check and Prune but I think its easier for Borgmatic to maintain after_* hooks for the above rather than trying to tightly integrate with all the different services. I believe crafting a curl command is easier for the community compared to maintaining X number of tight integrations with different providers.

I believe my scenario would be solved by adding additional hooks for after_* and providing better logs on checks.

>That makes me wonder: Would a more formal NodePing integration be desirable here? Similar to the existing Healthchecks/Cronitor/Cronhub integrations? That way you wouldn’t have to craft curl commands. It sounds like for use case, you’d need it to hit different endpoints for each of create and check, and indicate successes/failures separately for each. Sure, with Nodeping I would have to create 3 separate checks: Create, Check and Prune but I think its easier for Borgmatic to maintain `after_*` hooks for the above rather than trying to tightly integrate with all the different services. I believe crafting a curl command is easier for the community compared to maintaining X number of tight integrations with different providers. I believe my scenario would be solved by adding additional hooks for `after_*` and providing better logs on `checks`.
witten commented 2 months ago
Owner

Why not both? :smiley: Both being: Three separate checks + a NodePing integration. I'm fine starting this ticket at the former however.

Why not both? :smiley: Both being: Three separate checks + a NodePing integration. I'm fine starting this ticket at the former however.
witten added this to the per-action hooks milestone 2 months ago
witten commented 3 weeks ago
Owner

Okay, the following per-action hooks are implemented in master: “before_prune”, “after_prune”, “before_check”, and “after_check”. I'll post here when it's released. Thanks again for the idea!

Okay, the following per-action hooks are implemented in master: "before_prune", "after_prune", "before_check", and "after_check". I'll post here when it's released. Thanks again for the idea!
witten commented 3 weeks ago
Owner

Released in borgmatic 1.5.0!

Released in borgmatic 1.5.0!
mhaamann commented 3 weeks ago
Poster

Awesome. Looking forward to try it out

Awesome. Looking forward to try it out
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
Cancel
Save
There is no content yet.