Add support for healthchecks "log" feature #628 #645
No reviewers
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#645
Loading…
Reference in New Issue
No description provided.
Delete Branch ":feat628-add-healtchecks-logs-support"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Added support for healthchecks "log" feature. At this logging level, every detail is sent irrespective of error.
Signed-off-by: Soumik Dutta shalearkane@gmail.com
I looked into other logging services supported by borgmatic and here is the summary:
start
,finish
andfail
. So if required, we might need to changeschema.yaml
for ntfy.log
level toMONITOR_STATE_TO_CRONHUB
just in case if it is used.As for the test, I could not find any related test for specific logging levels. Hence I am not writing new tests for this change.
This generally looks good! I'm glad that it's relatively straightforward to implement.
Here are the areas where I think you could use a couple of basic tests:
tests/unit/hooks/test_healthchecks.py
, there's an existingtest_ping_monitor_hits_ping_url_for_fail_state()
test function, so you could make something very similar for the log state. Or you could even parameterize that existing test function if you wanted to. (See other usages of@pytest.mark.parametrize
.)tests/unit/commands/test_borgmatic.py
, you could add test coverage for your changes torun_configuration()
. For instance, there are existingtest_run_configuration_logs_monitor_finish_error()
andtest_run_configuration_bails_for_monitor_finish_soft_failure()
. You could probably do something similar for the log state.@ -10,6 +10,7 @@ MONITOR_STATE_TO_CRONHUB = {
monitor.State.START: 'start',
monitor.State.FINISH: 'finish',
monitor.State.FAIL: 'fail',
monitor.State.LOG: 'log',
I might be missing something, but without any other changes to the Cronhub hook's
ping_monitor()
function, won't it try to ping Cronhub with a/log/[uuid]
URL, which doesn't exist?Oops, I overlooked that issue. I will remove that from the dict.
@ -10,6 +10,7 @@ MONITOR_STATE_TO_CRONITOR = {
monitor.State.START: 'run',
monitor.State.FINISH: 'complete',
monitor.State.FAIL: 'fail',
monitor.State.LOG: 'ok',
I'm not sure this one makes sense for Cronitor. If I'm reading the docs correctly,
ok
is meant to "manually reset the monitor to a passing state". But the borgmatic "log" state is supposed to be independent of passing or failing, right?I do agree though that the PagerDuty hook already handles this. With these other monitoring hook integrations though, my recommendation would be just to just return early from
ping_monitor()
if an unsupported log state is requested similar to how the PagerDuty hook already does. That might mean taking theLOG
state back out of theMONITOR_STATE_TO_...
dict for hooks that don't support it.Yes. I was unsure about what it meant as "manually reset the monitor to a passing state" ... okay yeah, I remove the log state from the dict.
@ -118,3 +119,3 @@
logger.debug('{}: Using Healthchecks ping URL {}'.format(config_filename, ping_url))
if state in (monitor.State.FINISH, monitor.State.FAIL):
if state in (monitor.State.FINISH, monitor.State.FAIL, monitor.State.LOG):
This looks good! I'm glad to see the same logic for the finish/fail states work for the log state.
@ -11,3 +11,4 @@
monitor.State.FINISH: None,
monitor.State.FAIL: None,
monitor.State.LOG: None,
}
I would actually recommend removing
MONITOR_STATE_TO_NTFY
altogether. It looks like it's completely unused!But I agree that the Ntfy
ping_monitor()
should already handle ignoring thelog
state.I am having trouble with making
test_run_configuration_logs_monitor_start_error()
intest_borgmatic
to pass. It is giving me this error:If I change
to
the test runs successfully.
I'm not sure what's going on there, but you can try
tox -- -vv
to get more verbose output about what exactly is in[<flexmock.Mo...7f86663555d0>] == [<flexmock.Mo...7f86663555d0>]
.. For instance, I can't tell from that truncated view if one of the lists has multiple elements. I can just tell that the last element is the same in each list.My guess though is that
log_error_records()
is getting called twice given your updates torun_configuration()
—whereas when that test was originally written it was called once. So this line ...... means that the mocked
log_error_records()
function returnsexpected_results
each time it's called. Which might mean you're getting double the results you assert. To fix that, you could do something like:That would make it so each of the two times
log_error_records()
is called, it returns a different portion of the expected results.Not an official way I know of. Maybe something like this?
Thanks, now all of the tests are passing. I have modified the
test_run_configuration_logs_monitor_finish_error()
andtest_run_configuration_bails_for_monitor_finish_soft_failure()
too since we will be callingcall_hooks
thrice (2 for start and 1 for log) before calling thecall_hooks
for finish. Making the same change fortest_run_configuration_logs_monitor_start_error
also works since at first there will be anOS Error
at thecall_hooks
but there will be three morecall_hooks
(1 for log and 2 for fail).Also as for adding checks that
cronitor
andcronhub
returns early onlog
monitor state: after adding the checks that anything not defined in theirMONITOR_STATE...
dict should not be allowed, some tests failed. So I am not implementing them. Or should I implement a check just for that specificlog
monitor state? i.e. instead ofif state not in MONITOR_STATE.values(): return
, I useif state == monitor.state.log: return
?Sounds good!
if state not in MONITOR_STATE.values()
sounds like it'd be more correct than just checking for the log state in theping_monitor()
function. What are the tests that failed? Maybe they could be updated to account for this new behavior?I realize this might be a little bit of scope creep, but it sounds like it's potentially fixing "broken" tests.
For Cronitor, these tests are failing:
For Cronhub, these tests are failing:
Sorry that was an issue on my part 😅 . The tests work fine. I was checking if the state existed in the values instead of the keys of the dict
MONITOR_STATES_TO...
since in the config we refer to the states by string instead of an integer. I have updated the code with the check in place. I have also added tests that cover the new behaviour.This is really looking good. A couple more super minor edits and I think we're done! Feel free to remove the WIP flag whenever you're ready.
@ -29,1 +29,4 @@
'''
if state not in MONITOR_STATE_TO_CRONHUB:
logger.debug(
'{}: Ignoring unsupported monitoring {} in Cronhub hook'.format(
Heads up that I've been slowly starting to use f-strings instead of
.format()
in new code (or converting to f-strings when I work on old code). It's totally fine if you don't do it here though.@ -30,0 +33,4 @@
config_filename, state.name.lower()
)
)
return
This looks great!
@ -137,0 +139,4 @@
flexmock(module.borg_version).should_receive('local_borg_version').and_return(flexmock())
flexmock(module.dispatch).should_receive('call_hooks').and_return(None).and_return(
None
).and_return(None).and_raise(OSError)
I think it's fine for purposes of this PR that these tests are relying on implicit ordering of the
call_hooks()
calls, but it's perhaps on the edge of getting unwieldy/brittle. Maybe the next time this code is touched, it'd be good to mock out via...should_receive('call_hooks').with_args(...)
explicitly, so it's more specifically mocking the desired hook calls. Not your concern now though. 😃@ -105,0 +106,4 @@
def test_ping_monitor_with_unsupported_monitoring_state():
hook_config = {'ping_url': 'https://example.com'}
flexmock(module.logger).should_receive("debug").once().with_args(
Code style nit: This code base generally uses single quotes for strings.
Is there not a
flake8
/black
rule for that yet?I don't believe so. Maybe something like https://pypi.org/project/flake8-quotes/ would do that though?
Yes, I have "format-on-save" turned-on on VSCode with the default Python formatter but it didn't pickup the rule. Though I would be happy to setup pre-commit rules to run black/flake8 (with custom rules) and lints if you think that it's worthwhile 😄
I just pushed
flake8-quotes
support to master, so hopefully that will be sufficient. 😄@ -105,0 +108,4 @@
hook_config = {'ping_url': 'https://example.com'}
flexmock(module.logger).should_receive("debug").once().with_args(
'{}: Ignoring unsupported monitoring {} in Cronhub hook'.format("config.yaml", "log")
)
I'm generally not a huge fan of mocking/asserting that the log message in the code is actually that same message in the test, as it tends to be less useful of a thing to test. So what I'd recommend doing instead is dropping the whole
.with_args()
here and maybe adding a.never()
expectation on a particular function call so we're assured the test function actually exited early (which is what we really care about, presumably). Example:@ -90,0 +93,4 @@
hook_config = {'ping_url': 'https://example.com'}
flexmock(module.logger).should_receive("debug").once().with_args(
'{}: Ignoring unsupported monitoring {} in Cronitor hook'.format("config.yaml", "log")
)
Similar feedback on this test.
@ -187,0 +198,4 @@
state=module.monitor.State.LOG,
monitoring_log_level=1,
dry_run=False,
)
Looks good!
WIP: Add support for healthchecks "log" feature #628to Add support for healthchecks "log" feature #628Awesome! Thanks so much for doing all the back-and-forth. This will be part of the next release!