Added support for grafana loki #747
|
@ -1403,3 +1403,33 @@ properties:
|
|||
Configuration for a monitoring integration with Crunhub. Create an
|
||||
account at https://cronhub.io if you'd like to use this service. See
|
||||
borgmatic monitoring documentation for details.
|
||||
loki:
|
||||
type: object
|
||||
required: ['url', 'labels']
|
||||
additionalProperties: false
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
description: |
|
||||
Grafana loki log URL to notify when a backup begins,
|
||||
ends, or fails.
|
||||
example: "http://localhost:3100/loki/api/v1/push"
|
||||
labels:
|
||||
type: object
|
||||
additionalProperties:
|
||||
type: string
|
||||
witten marked this conversation as resolved
|
||||
description: |
|
||||
Allows setting custom labels for the logging stream. At
|
||||
least one label is required. "__hostname" gets replaced by
|
||||
the machine hostname automatically. "__config" gets replaced
|
||||
by just the name of the configuration file. "__config_path"
|
||||
witten marked this conversation as resolved
witten
commented
What do you think of standardizing on the What do you think of standardizing on the `{name}` placeholder syntax that's used elsewhere in borgmatic's configuration file and also by Borg as well?
|
||||
gets replaced by the full path of the configuration file.
|
||||
example:
|
||||
app: "borgmatic"
|
||||
config: "__config"
|
||||
hostname: "__hostname"
|
||||
description: |
|
||||
Configuration for a monitoring integration with Grafana loki. You
|
||||
can send the logs to a self-hosted instance or create an account at
|
||||
https://grafana.com/auth/sign-up/create-user. See borgmatic
|
||||
monitoring documentation for details.
|
||||
|
|
|
@ -11,6 +11,7 @@ from borgmatic.hooks import (
|
|||
pagerduty,
|
||||
postgresql,
|
||||
sqlite,
|
||||
loki,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
@ -26,6 +27,7 @@ HOOK_NAME_TO_MODULE = {
|
|||
'pagerduty': pagerduty,
|
||||
'postgresql_databases': postgresql,
|
||||
'sqlite_databases': sqlite,
|
||||
'loki': loki,
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,117 @@
|
|||
import logging
|
||||
import requests
|
||||
import json
|
||||
import time
|
||||
import platform
|
||||
from borgmatic.hooks import monitor
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MONITOR_STATE_TO_HEALTHCHECKS = {
|
||||
monitor.State.START: 'Started',
|
||||
monitor.State.FINISH: 'Finished',
|
||||
witten marked this conversation as resolved
witten
commented
This should probably be renamed to This should probably be renamed to `MONITOR_STATE_TO_LOKI`. 😄
|
||||
monitor.State.FAIL: 'Failed',
|
||||
}
|
||||
|
||||
# Threshold at which logs get flushed to loki
|
||||
MAX_BUFFER_LINES = 100
|
||||
|
||||
class loki_log_buffer():
|
||||
'''
|
||||
A log buffer that allows to output the logs as loki requests in json
|
||||
'''
|
||||
witten marked this conversation as resolved
witten
commented
Code style convention: Uppercase first letter of class names, e.g. Code style convention: Uppercase first letter of class names, e.g. `Loki_log_buffer`.
|
||||
def __init__(self, url, dry_run):
|
||||
self.url = url
|
||||
self.dry_run = dry_run
|
||||
self.root = {}
|
||||
self.root["streams"] = [{}]
|
||||
self.root["streams"][0]["stream"] = {}
|
||||
self.root["streams"][0]["values"] = []
|
||||
|
||||
def add_value(self, value):
|
||||
timestamp = str(time.time_ns())
|
||||
self.root["streams"][0]["values"].append((timestamp, value))
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
A perhaps more succinct/declarative way to express this might be:
A perhaps more succinct/declarative way to express this might be:
```
self.root = {
'streams': [{'stream': {}, 'values': []}]
}
```
|
||||
|
||||
def add_label(self, label, value):
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
It would be great if some of these functions have a brief docstring describing what they do. It would be great if some of these functions have a brief docstring describing what they do.
|
||||
self.root["streams"][0]["stream"][label] = value
|
||||
|
||||
def _to_request(self):
|
||||
return json.dumps(self.root)
|
||||
|
||||
def __len__(self):
|
||||
return len(self.root["streams"][0]["values"])
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
Note that I've been trying to stop using underscore functions in this codebase, operating under the "we're all adults" theory of function privacy. You'll still find them in places though. Note that I've been trying to stop using underscore functions in this codebase, operating under the "we're all adults" theory of function privacy. You'll still find them in places though.
|
||||
|
||||
def flush(self):
|
||||
if self.dry_run:
|
||||
self.root["streams"][0]["values"] = []
|
||||
return
|
||||
if len(self) == 0:
|
||||
return
|
||||
witten marked this conversation as resolved
witten
commented
Blank lines after some of these Blank lines after some of these `if`s would be nice. Maybe before the `try` as well. In general I find that helps readability even if it's not required by the language.
|
||||
request_body = self._to_request()
|
||||
self.root["streams"][0]["values"] = []
|
||||
request_header = {"Content-Type": "application/json"}
|
||||
try:
|
||||
r = requests.post(self.url, headers=request_header, data=request_body, timeout=5)
|
||||
r.raise_for_status()
|
||||
except requests.RequestException:
|
||||
logger.warn("Failed to upload logs to loki")
|
||||
|
||||
class loki_log_handeler(logging.Handler):
|
||||
witten marked this conversation as resolved
witten
commented
This code looks good, but it's a little counter-intuitive to me that it would be here. For instance, I think of a buffer as a data structure for storing stuff, not necessarily as a data structure that also implicitly has the side effect of pushing logs to an external service. Maybe I'm just not used to OOP. 😄 I don't feel super strongly or anything, but this might be less surprising if the push to Loki took place elsewhere like in I'm guessing part of the reason you're doing it this way though is so that logs get sent to Loki as borgmatic runs rather than all at the end once Anyway, let me know your thoughts. (And then maybe put some of them into docstrings. 😄) This code looks good, but it's a little counter-intuitive to me that it would be here. For instance, I think of a buffer as a data structure for storing stuff, not necessarily as a data structure that also implicitly has the side effect of pushing logs to an external service. Maybe I'm just not used to OOP. 😄 I don't feel super strongly or anything, but this might be less surprising if the push to Loki took place elsewhere like in `ping_monitor()`.
I'm guessing part of the reason you're doing it this way though is so that logs get sent to Loki as borgmatic runs rather than all at the end once `ping_monitor()` is called..? The Healthchecks hook for example only sends logs at the end, but the rationale there is that it's explicitly logging the success/failure status of the backup rather than only logs along the way. So the requirements may be a little different.
Anyway, let me know your thoughts. (And then maybe put some of them into docstrings. 😄)
IBims1NicerTobi
commented
Well I am trying to not hit the max request size limits of a lot of loki instances that run behind e.g. nginx. I think it is much better for large volumes of logs to be pushed incrementally instead of pushing it as one who knows how big request in the end. Well I am trying to not hit the max request size limits of a lot of loki instances that run behind e.g. nginx. I think it is much better for large volumes of logs to be pushed incrementally instead of pushing it as one who knows how big request in the end.
witten
commented
Gotcha. The Healthchecks hook "solves" that particular problem by reverse truncating the logs so that older messages are not sent if the logs get too big by the time Gotcha. The Healthchecks hook "solves" that particular problem by reverse truncating the logs so that older messages are not sent if the logs get too big by the time `ping_monitor()` is called. However I can see why you might not want to do that in this case, since loki seems much more about log aggregation than simply tracking service status.
|
||||
'''
|
||||
A log handler that sends logs to loki
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
I think this should be Also.. Nice-to-have but not required: Putting the I think this should be `.warning()` rather than `.warn()`. See [the logging docs](https://docs.python.org/3/howto/logging.html#when-to-use-logging) for more info.
Also.. Nice-to-have but not required: Putting the `config_filename` at the start of this warning message. See `healthchecks.py` error handling for an example.
|
||||
'''
|
||||
def __init__(self, url, dry_run):
|
||||
super().__init__()
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
Spelling and capitalization: Spelling and capitalization: `Loki_log_handler`
|
||||
self.buffer = loki_log_buffer(url, dry_run)
|
||||
|
||||
def emit(self, record):
|
||||
self.raw(record.getMessage())
|
||||
|
||||
def add_label(self, key, value):
|
||||
self.buffer.add_label(key, value)
|
||||
|
||||
def raw(self, msg):
|
||||
self.buffer.add_value(msg)
|
||||
if len(self.buffer) > MAX_BUFFER_LINES:
|
||||
self.buffer.flush()
|
||||
|
||||
def flush(self):
|
||||
if len(self.buffer) > 0:
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
Some brief docstrings would be good on these functions too. For instance, I'm not sure at a glance why this function is called Some brief docstrings would be good on these functions too. For instance, I'm not sure at a glance why this function is called `.raw()`.. I'm sure there's a good reason though.
|
||||
self.buffer.flush()
|
||||
|
||||
def initialize_monitor(hook_config, config, config_filename, monitoring_log_level, dry_run):
|
||||
'''
|
||||
Add a handler to the root logger to regularly send the logs to loki
|
||||
'''
|
||||
url = hook_config.get('url')
|
||||
loki = loki_log_handeler(url, dry_run)
|
||||
for k, v in hook_config.get('labels').items():
|
||||
if v == '__hostname':
|
||||
loki.add_label(k, platform.node())
|
||||
elif v == '__config':
|
||||
IBims1NicerTobi marked this conversation as resolved
witten
commented
Just a code style convention nit: Period at the end of sentences in docstrings. Just a code style convention nit: Period at the end of sentences in docstrings.
|
||||
loki.add_label(k, config_filename.split('/')[-1])
|
||||
elif v == '__config_path':
|
||||
loki.add_label(k, config_filename)
|
||||
else:
|
||||
loki.add_label(k, v)
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
Will users expect to substitute these placeholders within longer strings? Example value: Will users expect to substitute these placeholders within longer strings? Example value: `The name of this host is __hostname`
IBims1NicerTobi
commented
Not really. Grafana loki labels are rarely longer than a single word and the placeholders can be compared to what grafana themself have implemented in promtail which is their loki log agent. The label format is taken from https://grafana.com/docs/loki/latest/clients/promtail/configuration/#syslog. It works the same for almost all of promtail and I think I would rather adhere to their ecosystem of how they label streams than this one but thats up for debate. Not really. Grafana loki labels are rarely longer than a single word and the placeholders can be compared to what grafana themself have implemented in promtail which is their loki log agent. The label format is taken from https://grafana.com/docs/loki/latest/clients/promtail/configuration/#syslog. It works the same for almost all of promtail and I think I would rather adhere to their ecosystem of how they label streams than this one but thats up for debate.
witten
commented
I'm fine leaving it as-is if there's already convention on the loki side. I'm fine leaving it as-is if there's already convention on the loki side.
|
||||
logging.getLogger().addHandler(loki)
|
||||
|
||||
def ping_monitor(hook_config, config, config_filename, state, monitoring_log_level, dry_run):
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
Check out Check out `os.path.basename()` for this!
IBims1NicerTobi
commented
I wanted to ask about this anyway: Is the hostname of the system used anywhere else so we can have consistent naming? Not that it matters much for this as users can change the hostname label anyways but it would still be good to be consistent I wanted to ask about this anyway: Is the hostname of the system used anywhere else so we can have consistent naming? Not that it matters much for this as users can change the hostname label anyways but it would still be good to be consistent
witten
commented
In borgmatic code, I think the PagerDuty hook uses hostname, but that's about it. Well, I guess hostname can be part of the In borgmatic code, I think the PagerDuty hook uses hostname, but that's about it. Well, I guess hostname can be part of the `archive_name_format` configured in borgmatic but then it's passed to Borg.
|
||||
'''
|
||||
Adds an entry to the loki logger with the current state
|
||||
'''
|
||||
if not dry_run:
|
||||
for handler in tuple(logging.getLogger().handlers):
|
||||
if isinstance(handler, loki_log_handeler):
|
||||
if state in MONITOR_STATE_TO_HEALTHCHECKS.keys():
|
||||
handler.raw(f'{config_filename} {MONITOR_STATE_TO_HEALTHCHECKS[state]} backup')
|
||||
|
||||
def destroy_monitor(hook_config, config, config_filename, monitoring_log_level, dry_run):
|
||||
'''
|
||||
Remove the monitor handler that was added to the root logger.
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
You could instead do:
... just to cut down on the deep indentation a bit. That would also give you an opportunity to log about the fact that it's a dry run and you're bailing. See You could instead do:
```
if dry_run:
return
```
... just to cut down on the deep indentation a bit. That would also give you an opportunity to log about the fact that it's a dry run and you're bailing. See `healthchecks.py` or one of the other monitoring hooks for an example.
IBims1NicerTobi
commented
I don't think adding a entry that we are bailing is useful here as I added the entry for every time we flush the buffer now (which is at least once per program). I don't think adding a entry that we are bailing is useful here as I added the entry for every time we flush the buffer now (which is at least once per program).
witten
commented
Sounds good. Sounds good.
|
||||
'''
|
||||
logger = logging.getLogger()
|
||||
for handler in tuple(logger.handlers):
|
||||
if isinstance(handler, loki_log_handeler):
|
||||
witten marked this conversation as resolved
Outdated
witten
commented
Maybe put a Is there a reason though you're picking out the loki log handler to log to instead of just logging this as a general log message? E.g., you could just throw the word "loki" somewhere in it and use Maybe put a `:` after `{config_filename}` as per convention.
Is there a reason though you're picking out the loki log handler to log to instead of just logging this as a general log message? E.g., you could just throw the word "loki" somewhere in it and use `logger.info()` or `.debug()` or whatever. Again, check out the other monitoring hooks for examples.
IBims1NicerTobi
commented
Yes there is a reason for doing it this way. First of all as far as I am aware the function is just supposed to inform the monitoring app of the current state. If we used Yes there is a reason for doing it this way. First of all as far as I am aware the function is just supposed to inform the monitoring app of the current state. If we used `logger.info()` the message will depend on the current log level and settings etc. Skipping the logging system makes sure that: 1. The message is only send to loki and not any other agents (which should generate their own message) 2. The message is always send regardless of how the application logger is configured.
witten
commented
Okay, gotcha. Okay, gotcha.
|
||||
handler.flush()
|
||||
logger.removeHandler(handler)
|
|
@ -1,6 +1,6 @@
|
|||
from enum import Enum
|
||||
|
||||
MONITOR_HOOK_NAMES = ('healthchecks', 'cronitor', 'cronhub', 'pagerduty', 'ntfy')
|
||||
MONITOR_HOOK_NAMES = ('healthchecks', 'cronitor', 'cronhub', 'pagerduty', 'ntfy', 'loki')
|
||||
|
||||
|
||||
class State(Enum):
|
||||
|
|
Cool! I didn't know you could do this.