Create a cockpit for borgmatic #126

Closed
opened 2018-12-26 08:22:38 +00:00 by henfri Β· 17 comments

Hello,

for me the critical point regarding my backup strategy is, that

  1. it needs to be fully automated

  2. I need to be informed if anything goes wrong

  3. I need to be able to check the current status easily

  4. is fulfilled by borgmatic

  5. explicitly does not mean, that I am not informed for every successful run. This is because one does not notice if one message is missing.
    This is currently done for me by getting the crontab output. For this it is important not to get any output in case everything is fine. It works, but I would prefer this to not be dependand on crontab but built into borgmatic

  6. is yet missing. I am spoiled by crashplan:
    Crashplan
    But I do not request a GUI here... It can all be commandline output
    I looked at the borg documentation of available json and I suggest this structure:

during create:

archive_progress->
 archive_progress
 compressed_size
 deduplicated_size
 path

progress_percent
 message

Stats - appears once per repository:

 -percent successful backup attempts
 -average speed of last 10 backups
 -compressed_size, original_size

 last successful backup:
   -speed of last backup
   -compressed size
   -deduplicated_size
   -original_size
 
 last unsuccessful backup:
   -reason

Related issues:
witten/borgmatic#53
witten/borgmatic#86

Regards,
Hendrik

Hello, for me *the* critical point regarding my backup strategy is, that 1) it needs to be fully automated 2) I need to be informed if anything goes wrong 3) I need to be able to check the current status easily 1) is fulfilled by borgmatic 2) explicitly does not mean, that I am not informed for every successful run. This is because one does not notice if one message is missing. This is currently done for me by getting the crontab output. For this it is important not to get any output in case everything is fine. It works, but I would prefer this to not be dependand on crontab but built into borgmatic 3) is yet missing. I am spoiled by crashplan: ![Crashplan](https://i1.wp.com/www.accuratereviews.com/wordpress/wp-content/uploads/2015/09/crashplan.jpg) But I do not request a GUI here... It can all be commandline output I looked at the borg documentation of available json and I suggest this structure: during create: ``` archive_progress-> archive_progress compressed_size deduplicated_size path progress_percent message ``` Stats - appears once per repository: ``` -percent successful backup attempts -average speed of last 10 backups -compressed_size, original_size last successful backup: -speed of last backup -compressed size -deduplicated_size -original_size last unsuccessful backup: -reason ``` Related issues: https://projects.torsion.org/witten/borgmatic/issues/53 https://projects.torsion.org/witten/borgmatic/issues/86 Regards, Hendrik

It would require a gui or a web application no?

It would require a gui or a web application no?
Author

Hello,

I am just showing an example in the screenshot above.
It could be just a text-output. No User-Interaction and nothing graphical.

Greetings,
Hendrik

Hello, I am just showing an example in the screenshot above. It could be just a text-output. No User-Interaction and nothing graphical. Greetings, Hendrik

It could feasibly be implemented with some logging and a separate command that just parses that log.

It could feasibly be implemented with some logging and a separate command that just parses that log.
Owner

Some related discussion on #174.

Some related discussion on #174.
Contributor

Hey, moving over from witten/borgmatic#174, I am interested in number 2 of this issue.

I've seen netdata's alerts handler (it is a monitoring tool primarily) and it has a shiittload of them: https://github.com/netdata/netdata/blob/master/health/notifications/alarm-notify.sh.in. If borgmatic implements one handler, someone will come and ask for another.

What is acceptable for borgmatic (not feature creep) and works for us.

I think my feeling is to provide more context to on_error (see witten/borgmatic#174). The basic requirements are to know which archive was being backed up and at what time when the failure happened and then to have this available to pass to some handler (pushbullet, twilio, etc.).

However, there is a niggling feeling that this is simply a documentation issue and this could be declared out of scope for the tool ... thoughts!?

Hey, moving over from https://projects.torsion.org/witten/borgmatic/issues/174, I am interested in number 2 of this issue. I've seen [netdata's](https://my-netdata.io/) alerts handler (it is a monitoring tool primarily) and it has a shiittload of them: https://github.com/netdata/netdata/blob/master/health/notifications/alarm-notify.sh.in. If borgmatic implements one handler, someone will come and ask for another. What is acceptable for borgmatic (not feature creep) and works for us. I think my feeling is to provide more context to `on_error` (see https://projects.torsion.org/witten/borgmatic/issues/174). The basic requirements are to know which archive was being backed up and at what time when the failure happened and then to have this available to pass to some handler (pushbullet, twilio, etc.). However, there is a niggling feeling that this is simply a documentation issue and this could be declared out of scope for the tool ... thoughts!?
Owner

Some thoughts on this:

  • I think this is a great idea, and something that users would greatly benefit from. Today, borgmatic is good at making backups, but honestly pretty bad at making sure backups happen. And that last part is really the last mile of a holistic backup solution.
  • The "need to be informed if anything goes wrong" feature seems like a good place to start, and perhaps should be broken off into another ticket. Then, once that's done, we can focus on making the actual cockpit you look at when things to go wrong. I agree that a console cockpit may be a good first step on that front.
  • There are basically two distinct models for backup failure notifications: 1. The backup process itself is responsible for alerting the administrator (email, SMS, whatever) when a backup fails, or 2. Something completely separate from the backup process is responsible monitoring what backups appear, and alerting the administrator (email, SMS, whatever) if it looks like backups are failing or not happening for any reason.
  • Option number 2 may be safer in theory, because if your backups start silently breaking, you'll still find out. It also has the benefit of more cleanly separating the monitoring/alerting functionality from the backup code. However, it is more moving parts, and may be more work to build than option number 1.

Thoughts/reactions?

Some thoughts on this: * I think this is a great idea, and something that users would greatly benefit from. Today, borgmatic is good at *making* backups, but honestly pretty bad at *making sure* backups happen. And that last part is really the last mile of a holistic backup solution. * The "need to be informed if anything goes wrong" feature seems like a good place to start, and perhaps should be broken off into another ticket. Then, once that's done, we can focus on making the actual cockpit you look at when things to go wrong. I agree that a console cockpit may be a good first step on that front. * There are basically two distinct models for backup failure notifications: 1. The backup process itself is responsible for alerting the administrator (email, SMS, whatever) when a backup fails, or 2. Something completely separate from the backup process is responsible monitoring what backups appear, and alerting the administrator (email, SMS, whatever) if it looks like backups are failing *or* not happening for any reason. * Option number 2 may be safer in theory, because if your backups start silently breaking, you'll still find out. It also has the benefit of more cleanly separating the monitoring/alerting functionality from the backup code. However, it is more moving parts, and may be more work to build than option number 1. Thoughts/reactions?
Owner

One thing that would be helpful is: When do y'all intend to use / look at the cockpit output? On every backup? Only when things go wrong, you get alerted, and you need to dig in? Or some other time?

One thing that would be helpful is: When do y'all intend to use / look at the cockpit output? On every backup? Only when things go wrong, you get alerted, and you need to dig in? Or some other time?
Author

Hello,

yes, clearly option 2 is safer, but it also means 'starting from scratch'...

On the 'when': The notifications should clearly follow a 'lights out philosophy'. No message means, everything is good. If one gets flooded by daily status mails, one will start ignoring them.
This requires of course, that the system runs reilably.
So, maybe a watchdog (did borgmatic actually do something?) would be good. Otherwise, the lights out philosophy can go very wrong.

Greetings,
Hendrik

Hello, yes, clearly option 2 is safer, but it also means 'starting from scratch'... On the 'when': The notifications should clearly follow a 'lights out philosophy'. No message means, everything is good. If one gets flooded by daily status mails, one will start ignoring them. This requires of course, that the system runs reilably. So, maybe a watchdog (did borgmatic actually do something?) would be good. Otherwise, the lights out philosophy can go very wrong. Greetings, Hendrik
Owner

Yup, that philosophy makes sense to me.

Yup, that philosophy makes sense to me.
Owner

FYI, I reopened and implemented #174 with the idea that it carves off a piece of the ask in this ticket (#126): More immediate alerting when a backup fails.

Still to do: Separate monitoring + cockpit.

FYI, I reopened and implemented #174 with the idea that it carves off a piece of the ask in this ticket (#126): More immediate alerting when a backup fails. Still to do: Separate monitoring + cockpit.
Owner

Note that #86 is now implemented. That feature supports one approach to the "separate monitoring" ask, which is why I'm mentioning it here.

Note that #86 is now implemented. That feature supports one approach to the "separate monitoring" ask, which is why I'm mentioning it here.
Owner

Okay, I implemented #223 (dead man's switch via Healthchecks integration), and I also wrote up docs on a number of options for borgmatic monitoring and alerting. Feedback is welcome, tickets on new variants of monitoring/alerting are welcome, but I'm going to consider the "separate monitoring" ask in this ticket to be done for now.

Still to do: Cockpit.

Okay, I implemented #223 (dead man's switch via [Healthchecks](https://healthchecks.io/) integration), and I also wrote up docs on a number of options for borgmatic [monitoring and alerting](https://torsion.org/borgmatic/docs/how-to/monitor-your-backups/). Feedback is welcome, tickets on new variants of monitoring/alerting are welcome, but I'm going to consider the "separate monitoring" ask in this ticket to be done for now. Still to do: Cockpit.
Author

Great!
I will try. Thanks!

Great! I will try. Thanks!
Author

Hello,

I really like the healthchecks.io integration! Thank you!

Greetings,
Hendrik

Hello, I really like the healthchecks.io integration! Thank you! Greetings, Hendrik

just wanted to say that Witten looks like awesome to work with/to hire

just wanted to say that Witten looks like awesome to work with/to hire
Owner

Hah, thanks for the kind words. Let me know if you (or your employer) have any projects that need doing!

Hah, thanks for the kind words. Let me know if you (or your employer) have any projects that need doing!
Owner

Given the lack of activity on this ticket (my fault) and the various ways to monitor borgmatic now, I'm closing this ticket. Not all of the features in the original issue are covered by those. But, for instance, Healthchecks or Borgbase UIs go pretty far at proving a borgmatic "cockpit." And Healthchecks can be self-hosted if you don't want to use a third-party cloud provider.

If there are remaining asks unfulfilled that folks still care about, please feel free to file those as separate tickets. Thank you!

Given the lack of activity on this ticket (my fault) and [the various ways to monitor borgmatic now](https://torsion.org/borgmatic/docs/how-to/monitor-your-backups/), I'm closing this ticket. Not all of the features in the original issue are covered by those. But, for instance, Healthchecks or Borgbase UIs go pretty far at proving a borgmatic "cockpit." And Healthchecks can be self-hosted if you don't want to use a third-party cloud provider. If there are remaining asks unfulfilled that folks still care about, please feel free to file those as separate tickets. Thank you!
Sign in to join this conversation.
No Milestone
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#126
No description provided.