Add support for healthchecks "log" feature #628 #645

Merged
witten merged 8 commits from :feat628-add-healtchecks-logs-support into master 2023-03-07 22:21:31 +00:00
Contributor

Added support for healthchecks "log" feature. At this logging level, every detail is sent irrespective of error.

Signed-off-by: Soumik Dutta shalearkane@gmail.com

Added support for healthchecks "log" feature. At this logging level, every detail is sent irrespective of error. Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Soumik_Dutta added 1 commit 2023-03-05 14:00:01 +00:00
69f6695253 Add support for healthchecks "log" feature #628
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Author
Contributor

I looked into other logging services supported by borgmatic and here is the summary:

  • ntfy : every logging state is the same. However in the yaml configuration, there are only 3 defined states : start, finish and fail. So if required, we might need to change schema.yaml for ntfy.
  • pagerduty : it has a different level of logging only for the fail state. So no changes required IMO.
  • cronhub : no changes required IMO. I added log level to MONITOR_STATE_TO_CRONHUB just in case if it is used.
  • cronitor : only supports 'run', 'complete', 'fail' and 'ok' (source). However we only use 'run', 'complete' and 'fail'. So I have assigned 'ok' to 'log' for now.
I looked into other logging services supported by borgmatic and here is the summary: - **ntfy** : every logging state is the same. However in the yaml configuration, there are only 3 defined states : `start`, `finish` and `fail`. So if required, we might need to change `schema.yaml` for ntfy. - **pagerduty** : it has a different level of logging only for the fail state. So no changes required IMO. - **cronhub** : no changes required IMO. I added `log` level to `MONITOR_STATE_TO_CRONHUB` just in case if it is used. - **cronitor** : only supports 'run', 'complete', 'fail' and 'ok' ([source](https://cronitor.io/docs/telemetry-api)). However we only use 'run', 'complete' and 'fail'. So I have assigned 'ok' to 'log' for now.
Soumik_Dutta added 1 commit 2023-03-05 16:27:22 +00:00
1573d68fe2 update schema.yaml description
also add monitor.State.LOG to cronitor.

Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Author
Contributor

As for the test, I could not find any related test for specific logging levels. Hence I am not writing new tests for this change.

As for the test, I could not find any related test for specific logging levels. Hence I am not writing new tests for this change.
witten requested changes 2023-03-05 19:06:36 +00:00
witten left a comment
Owner

This generally looks good! I'm glad that it's relatively straightforward to implement.

Here are the areas where I think you could use a couple of basic tests:

  • In tests/unit/hooks/test_healthchecks.py, there's an existing test_ping_monitor_hits_ping_url_for_fail_state() test function, so you could make something very similar for the log state. Or you could even parameterize that existing test function if you wanted to. (See other usages of @pytest.mark.parametrize.)
  • In tests/unit/commands/test_borgmatic.py, you could add test coverage for your changes to run_configuration(). For instance, there are existing test_run_configuration_logs_monitor_finish_error() and test_run_configuration_bails_for_monitor_finish_soft_failure(). You could probably do something similar for the log state.
This generally looks good! I'm glad that it's relatively straightforward to implement. Here are the areas where I think you could use a couple of basic tests: * In `tests/unit/hooks/test_healthchecks.py`, there's an existing `test_ping_monitor_hits_ping_url_for_fail_state()` test function, so you could make something very similar for the log state. Or you could even parameterize that existing test function if you wanted to. (See other usages of `@pytest.mark.parametrize`.) * In `tests/unit/commands/test_borgmatic.py`, you could add test coverage for your changes to `run_configuration()`. For instance, there are existing `test_run_configuration_logs_monitor_finish_error()` and `test_run_configuration_bails_for_monitor_finish_soft_failure()`. You could probably do something similar for the log state.
@ -10,6 +10,7 @@ MONITOR_STATE_TO_CRONHUB = {
monitor.State.START: 'start',
monitor.State.FINISH: 'finish',
monitor.State.FAIL: 'fail',
monitor.State.LOG: 'log',
Owner

I might be missing something, but without any other changes to the Cronhub hook's ping_monitor() function, won't it try to ping Cronhub with a /log/[uuid] URL, which doesn't exist?

I might be missing something, but without any other changes to the Cronhub hook's `ping_monitor()` function, won't it try to ping Cronhub with a `/log/[uuid]` URL, which [doesn't exist](https://docs.cronhub.io/how-to-ping.html#ping-api)?
Author
Contributor

Oops, I overlooked that issue. I will remove that from the dict.

Oops, I overlooked that issue. I will remove that from the dict.
Soumik_Dutta marked this conversation as resolved
@ -10,6 +10,7 @@ MONITOR_STATE_TO_CRONITOR = {
monitor.State.START: 'run',
monitor.State.FINISH: 'complete',
monitor.State.FAIL: 'fail',
monitor.State.LOG: 'ok',
Owner

I'm not sure this one makes sense for Cronitor. If I'm reading the docs correctly, ok is meant to "manually reset the monitor to a passing state". But the borgmatic "log" state is supposed to be independent of passing or failing, right?

I do agree though that the PagerDuty hook already handles this. With these other monitoring hook integrations though, my recommendation would be just to just return early from ping_monitor() if an unsupported log state is requested similar to how the PagerDuty hook already does. That might mean taking the LOG state back out of the MONITOR_STATE_TO_... dict for hooks that don't support it.

I'm not sure this one makes sense for Cronitor. If I'm reading [the docs](https://cronitor.io/docs/telemetry-api) correctly, `ok` is meant to "manually reset the monitor to a passing state". But the borgmatic "log" state is supposed to be independent of passing or failing, right? I do agree though that the PagerDuty hook already handles this. With these other monitoring hook integrations though, my recommendation would be just to just return early from `ping_monitor()` if an unsupported log state is requested similar to how the PagerDuty hook already does. That might mean taking the `LOG` state back out of the `MONITOR_STATE_TO_...` dict for hooks that don't support it.
Author
Contributor

Yes. I was unsure about what it meant as "manually reset the monitor to a passing state" ... okay yeah, I remove the log state from the dict.

Yes. I was unsure about what it meant as "manually reset the monitor to a passing state" ... okay yeah, I remove the log state from the dict.
Soumik_Dutta marked this conversation as resolved
@ -118,3 +119,3 @@
logger.debug('{}: Using Healthchecks ping URL {}'.format(config_filename, ping_url))
if state in (monitor.State.FINISH, monitor.State.FAIL):
if state in (monitor.State.FINISH, monitor.State.FAIL, monitor.State.LOG):
Owner

This looks good! I'm glad to see the same logic for the finish/fail states work for the log state.

This looks good! I'm glad to see the same logic for the finish/fail states work for the log state.
Soumik_Dutta marked this conversation as resolved
@ -11,3 +11,4 @@
monitor.State.FINISH: None,
monitor.State.FAIL: None,
monitor.State.LOG: None,
}
Owner

I would actually recommend removing MONITOR_STATE_TO_NTFY altogether. It looks like it's completely unused!

But I agree that the Ntfy ping_monitor() should already handle ignoring the log state.

I would actually recommend removing `MONITOR_STATE_TO_NTFY` altogether. It looks like it's completely unused! But I agree that the Ntfy `ping_monitor()` should already handle ignoring the `log` state.
Soumik_Dutta marked this conversation as resolved
Soumik_Dutta added 1 commit 2023-03-05 22:08:17 +00:00
45256ae33f add test for healthchecks
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Author
Contributor

I am having trouble with making test_run_configuration_logs_monitor_start_error() in test_borgmatic to pass. It is giving me this error:

FAILED tests/unit/commands/test_borgmatic.py::test_run_configuration_logs_monitor_start_error - assert [<flexmock.Mo...7f86663555d0>] == [<flexmock.Mo...7f86663555d0>]

If I change

try:
        if using_primary_action:
            # send logs irrespective of error

to

try:
        if using_primary_action and not encountered_error:
            # send logs irrespective of error

the test runs successfully.

I am having trouble with making `test_run_configuration_logs_monitor_start_error()` in `test_borgmatic` to pass. It is giving me this error: ```python FAILED tests/unit/commands/test_borgmatic.py::test_run_configuration_logs_monitor_start_error - assert [<flexmock.Mo...7f86663555d0>] == [<flexmock.Mo...7f86663555d0>] ``` If I change ```python try: if using_primary_action: # send logs irrespective of error ``` to ```python try: if using_primary_action and not encountered_error: # send logs irrespective of error ``` the test runs successfully.
Owner

I'm not sure what's going on there, but you can try tox -- -vv to get more verbose output about what exactly is in [<flexmock.Mo...7f86663555d0>] == [<flexmock.Mo...7f86663555d0>].. For instance, I can't tell from that truncated view if one of the lists has multiple elements. I can just tell that the last element is the same in each list.

My guess though is that log_error_records() is getting called twice given your updates to run_configuration()—whereas when that test was originally written it was called once. So this line ...

    flexmock(module).should_receive('log_error_records').and_return(expected_results)

... means that the mocked log_error_records() function returns expected_results each time it's called. Which might mean you're getting double the results you assert. To fix that, you could do something like:

    expected_results = [flexmock(), flexmock()]
    flexmock(module).should_receive('log_error_records').and_return(expected_results[0:1]).and_return(expected_results[1:2])
    ...

    assert results == expected_results

That would make it so each of the two times log_error_records() is called, it returns a different portion of the expected results.

I'm not sure what's going on there, but you can try `tox -- -vv` to get more verbose output about what exactly is in `[<flexmock.Mo...7f86663555d0>] == [<flexmock.Mo...7f86663555d0>]`.. For instance, I can't tell from that truncated view if one of the lists has multiple elements. I can just tell that the last element is the same in each list. My *guess* though is that `log_error_records()` is getting called twice given your updates to `run_configuration()`—whereas when that test was originally written it was called once. So this line ... ```python flexmock(module).should_receive('log_error_records').and_return(expected_results) ``` ... means that the mocked `log_error_records()` function returns `expected_results` each time it's called. Which might mean you're getting double the results you assert. To fix that, you could do something like: ```python expected_results = [flexmock(), flexmock()] flexmock(module).should_receive('log_error_records').and_return(expected_results[0:1]).and_return(expected_results[1:2]) ... assert results == expected_results ``` That would make it so each of the two times `log_error_records()` is called, it returns a different portion of the expected results.
Owner

Is there a way to pretty-print flexmock() objects?

Not an official way I know of. Maybe something like this?

import pprint

pprint.pprint(flexmock(foo=7, bar=5).__dict__)
> Is there a way to pretty-print flexmock() objects? Not an official way I know of. Maybe something like this? ```python import pprint pprint.pprint(flexmock(foo=7, bar=5).__dict__) ```
Soumik_Dutta added 2 commits 2023-03-05 23:52:03 +00:00
e211863cba update test_borgmatic.py
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
f442aeae9c fix logs_monitor_start_error()
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Author
Contributor

Thanks, now all of the tests are passing. I have modified the test_run_configuration_logs_monitor_finish_error() and test_run_configuration_bails_for_monitor_finish_soft_failure() too since we will be calling call_hooks thrice (2 for start and 1 for log) before calling the call_hooks for finish. Making the same change for test_run_configuration_logs_monitor_start_error also works since at first there will be an OS Error at the call_hooks but there will be three more call_hooks (1 for log and 2 for fail).

Also as for adding checks that cronitor and cronhub returns early on log monitor state: after adding the checks that anything not defined in their MONITOR_STATE... dict should not be allowed, some tests failed. So I am not implementing them. Or should I implement a check just for that specific log monitor state? i.e. instead of
if state not in MONITOR_STATE.values(): return, I use if state == monitor.state.log: return ?

Thanks, now all of the tests are passing. I have modified the `test_run_configuration_logs_monitor_finish_error()` and `test_run_configuration_bails_for_monitor_finish_soft_failure()` too since we will be calling `call_hooks` thrice (2 for start and 1 for log) before calling the `call_hooks` for finish. Making the same change for `test_run_configuration_logs_monitor_start_error` also works since at first there will be an `OS Error` at the `call_hooks` but there will be three more `call_hooks` (1 for log and 2 for fail). Also as for adding checks that `cronitor` and `cronhub` returns early on `log` monitor state: after adding the checks that anything not defined in their `MONITOR_STATE...` dict should not be allowed, some tests failed. So I am not implementing them. Or should I implement a check just for that specific `log` monitor state? i.e. instead of `if state not in MONITOR_STATE.values(): return`, I use `if state == monitor.state.log: return` ?
Owner

Thanks, now all of the tests are passing. I have modified the test_run_configuration_logs_monitor_finish_error() and test_run_configuration_bails_for_monitor_finish_soft_failure() too since we will be calling call_hooks thrice (2 for start and 1 for log) before calling the call_hooks for finish. Making the same change for test_run_configuration_logs_monitor_start_error also works since at first there will be an OS Error at the call_hooks but there will be three more call_hooks (1 for log and 2 for fail).

Sounds good!

Also as for adding checks that cronitor and cronhub returns early on log monitor state: after adding the checks that anything not defined in their MONITOR_STATE... dict should not be allowed, some tests failed. So I am not implementing them. Or should I implement a check just for that specific log monitor state? i.e. instead of
if state not in MONITOR_STATE.values(): return, I use if state == monitor.state.log: return ?

if state not in MONITOR_STATE.values() sounds like it'd be more correct than just checking for the log state in the ping_monitor() function. What are the tests that failed? Maybe they could be updated to account for this new behavior?

I realize this might be a little bit of scope creep, but it sounds like it's potentially fixing "broken" tests.

> Thanks, now all of the tests are passing. I have modified the test_run_configuration_logs_monitor_finish_error() and test_run_configuration_bails_for_monitor_finish_soft_failure() too since we will be calling call_hooks thrice (2 for start and 1 for log) before calling the call_hooks for finish. Making the same change for test_run_configuration_logs_monitor_start_error also works since at first there will be an OS Error at the call_hooks but there will be three more call_hooks (1 for log and 2 for fail). Sounds good! > Also as for adding checks that cronitor and cronhub returns early on log monitor state: after adding the checks that anything not defined in their MONITOR_STATE... dict should not be allowed, some tests failed. So I am not implementing them. Or should I implement a check just for that specific log monitor state? i.e. instead of if state not in MONITOR_STATE.values(): return, I use if state == monitor.state.log: return ? `if state not in MONITOR_STATE.values()` sounds like it'd be more correct than just checking for the log state in the `ping_monitor()` function. What are the tests that failed? Maybe they could be updated to account for this new behavior? I realize this might be a little bit of scope creep, but it sounds like it's potentially fixing "broken" tests.
Author
Contributor

For Cronitor, these tests are failing:

FAILED tests/unit/hooks/test_cronitor.py::test_ping_monitor_with_connection_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times

FAILED tests/unit/hooks/test_cronitor.py::test_ping_monitor_with_other_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times

For Cronhub, these tests are failing:

FAILED tests/unit/hooks/test_cronhub.py::test_ping_monitor_with_connection_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times

FAILED tests/unit/hooks/test_cronhub.py::test_ping_monitor_with_other_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times
For Cronitor, these tests are failing: ```bash FAILED tests/unit/hooks/test_cronitor.py::test_ping_monitor_with_connection_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times FAILED tests/unit/hooks/test_cronitor.py::test_ping_monitor_with_other_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times ``` For Cronhub, these tests are failing: ```bash FAILED tests/unit/hooks/test_cronhub.py::test_ping_monitor_with_connection_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times FAILED tests/unit/hooks/test_cronhub.py::test_ping_monitor_with_other_error_logs_warning - flexmock.MethodCallError: warning() expected to be called exactly 1 times, called 0 times ```
Soumik_Dutta added 1 commit 2023-03-06 14:29:09 +00:00
4fcfddbe08 return early if unsupported state is passed
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Soumik_Dutta added 1 commit 2023-03-06 15:01:09 +00:00
98e429594e added tests to make sure unsupported log states are detected
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Author
Contributor

Sorry that was an issue on my part 😅 . The tests work fine. I was checking if the state existed in the values instead of the keys of the dict MONITOR_STATES_TO... since in the config we refer to the states by string instead of an integer. I have updated the code with the check in place. I have also added tests that cover the new behaviour.

Sorry that was an issue on my part 😅 . The tests work fine. I was checking if the state existed in the values instead of the keys of the dict `MONITOR_STATES_TO...` since in the config we refer to the states by string instead of an integer. I have updated the code with the check in place. I have also added tests that cover the new behaviour.
Soumik_Dutta requested review from witten 2023-03-06 15:21:54 +00:00
witten requested changes 2023-03-07 06:35:37 +00:00
witten left a comment
Owner

This is really looking good. A couple more super minor edits and I think we're done! Feel free to remove the WIP flag whenever you're ready.

This is really looking good. A couple more super minor edits and I think we're done! Feel free to remove the WIP flag whenever you're ready.
@ -29,1 +29,4 @@
'''
if state not in MONITOR_STATE_TO_CRONHUB:
logger.debug(
'{}: Ignoring unsupported monitoring {} in Cronhub hook'.format(
Owner

Heads up that I've been slowly starting to use f-strings instead of .format() in new code (or converting to f-strings when I work on old code). It's totally fine if you don't do it here though.

Heads up that I've been slowly starting to use [f-strings](https://realpython.com/python-f-strings/) instead of `.format()` in new code (or converting to f-strings when I work on old code). It's totally fine if you don't do it here though.
Soumik_Dutta marked this conversation as resolved
@ -30,0 +33,4 @@
config_filename, state.name.lower()
)
)
return
Owner

This looks great!

This looks great!
Soumik_Dutta marked this conversation as resolved
@ -137,0 +139,4 @@
flexmock(module.borg_version).should_receive('local_borg_version').and_return(flexmock())
flexmock(module.dispatch).should_receive('call_hooks').and_return(None).and_return(
None
).and_return(None).and_raise(OSError)
Owner

I think it's fine for purposes of this PR that these tests are relying on implicit ordering of the call_hooks() calls, but it's perhaps on the edge of getting unwieldy/brittle. Maybe the next time this code is touched, it'd be good to mock out via ...should_receive('call_hooks').with_args(...) explicitly, so it's more specifically mocking the desired hook calls. Not your concern now though. 😃

I think it's fine for purposes of this PR that these tests are relying on implicit ordering of the `call_hooks()` calls, but it's perhaps on the edge of getting unwieldy/brittle. Maybe the next time this code is touched, it'd be good to mock out via `...should_receive('call_hooks').with_args(...)` explicitly, so it's more specifically mocking the desired hook calls. Not your concern now though. 😃
Soumik_Dutta marked this conversation as resolved
@ -105,0 +106,4 @@
def test_ping_monitor_with_unsupported_monitoring_state():
hook_config = {'ping_url': 'https://example.com'}
flexmock(module.logger).should_receive("debug").once().with_args(
Owner

Code style nit: This code base generally uses single quotes for strings.

Code style nit: This code base generally uses single quotes for strings.
Collaborator

Is there not a flake8/black rule for that yet?

Is there not a `flake8`/`black` rule for that yet?
Owner

I don't believe so. Maybe something like https://pypi.org/project/flake8-quotes/ would do that though?

I don't believe so. Maybe something like https://pypi.org/project/flake8-quotes/ would do that though?
Author
Contributor

Yes, I have "format-on-save" turned-on on VSCode with the default Python formatter but it didn't pickup the rule. Though I would be happy to setup pre-commit rules to run black/flake8 (with custom rules) and lints if you think that it's worthwhile 😄

Yes, I have "format-on-save" turned-on on VSCode with the default Python formatter but it didn't pickup the rule. Though I would be happy to setup pre-commit rules to run black/flake8 (with custom rules) and lints if you think that it's worthwhile 😄
Owner

I just pushed flake8-quotes support to master, so hopefully that will be sufficient. 😄

I just pushed `flake8-quotes` support to master, so hopefully that will be sufficient. 😄
Soumik_Dutta marked this conversation as resolved
@ -105,0 +108,4 @@
hook_config = {'ping_url': 'https://example.com'}
flexmock(module.logger).should_receive("debug").once().with_args(
'{}: Ignoring unsupported monitoring {} in Cronhub hook'.format("config.yaml", "log")
)
Owner

I'm generally not a huge fan of mocking/asserting that the log message in the code is actually that same message in the test, as it tends to be less useful of a thing to test. So what I'd recommend doing instead is dropping the whole .with_args() here and maybe adding a .never() expectation on a particular function call so we're assured the test function actually exited early (which is what we really care about, presumably). Example:

flexmock(module.requests).should_receive('get').never()
I'm generally not a huge fan of mocking/asserting that the log message in the code is actually that same message in the test, as it tends to be less useful of a thing to test. So what I'd recommend doing instead is dropping the whole `.with_args()` here and maybe adding a `.never()` expectation on a particular function call so we're assured the test function actually exited early (which is what we really care about, presumably). Example: ```python flexmock(module.requests).should_receive('get').never() ```
Soumik_Dutta marked this conversation as resolved
@ -90,0 +93,4 @@
hook_config = {'ping_url': 'https://example.com'}
flexmock(module.logger).should_receive("debug").once().with_args(
'{}: Ignoring unsupported monitoring {} in Cronitor hook'.format("config.yaml", "log")
)
Owner

Similar feedback on this test.

Similar feedback on this test.
Soumik_Dutta marked this conversation as resolved
@ -187,0 +198,4 @@
state=module.monitor.State.LOG,
monitoring_log_level=1,
dry_run=False,
)
Owner

Looks good!

Looks good!
Soumik_Dutta marked this conversation as resolved
Soumik_Dutta added 1 commit 2023-03-07 22:00:19 +00:00
044ae7869a fix tests
Signed-off-by: Soumik Dutta <shalearkane@gmail.com>
Soumik_Dutta changed title from WIP: Add support for healthchecks "log" feature #628 to Add support for healthchecks "log" feature #628 2023-03-07 22:01:41 +00:00
Owner

Awesome! Thanks so much for doing all the back-and-forth. This will be part of the next release!

Awesome! Thanks so much for doing all the back-and-forth. This will be part of the next release!
witten merged commit 332f7c4bb6 into master 2023-03-07 22:21:31 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#645
No description provided.