WIP: Separate endpoint for each primary action for monitor hooks #660

Closed
jetchirag wants to merge 8 commits from jetchirag/borgmatic:action-based-monitor-endpoints into main
2 changed files with 91 additions and 58 deletions
Showing only changes of commit aa6f1e6623 - Show all commits

View File

@ -66,13 +66,10 @@ def run_configuration(config_filename, config, arguments):
error_repository = ''
using_primary_action = {'create', 'prune', 'compact', 'check'}.intersection(arguments)
monitoring_log_level = verbosity_to_log_level(global_arguments.monitoring_verbosity)
action_name = next(iter(arguments))
skip_monitoring = global_arguments.skip_monitoring
# action_names = [action for action in arguments.keys() if action != 'global' and isinstance(action, str)]
# print (action_names)
# return
skip_monitoring = getattr(global_arguments, 'skip_monitoring', False)
jetchirag marked this conversation as resolved Outdated
Outdated
Review

nit(pythonic): next(iter(arguments))

nit(pythonic): `next(iter(arguments))`
Outdated
Review

Also, if we are only using this variable when using_primary_action, it might make sense to rename it to primary_aciton_name and/or move it where it's needed too.

Also, if we are only using this variable when `using_primary_action`, it might make sense to rename it to `primary_aciton_name` and/or move it where it's needed too.

Thanks, that's definitely more effecient.

Thanks, that's definitely more effecient.

If I'm reading this correctly, this will just grab the first action. But there could very well be multiple actions at once, and I'd expect Cronitor to be pinged for each one (if configured for each one).

If I'm reading this correctly, this will just grab the first action. But there could very well be multiple actions at once, and I'd expect Cronitor to be pinged for each one (if configured for each one).

@witten I unexpectedly missed that multiple actions. Will fix it.

@witten I unexpectedly missed that multiple actions. Will fix it.

Why getattr()? Doesn't the attribute always exist now that you've added it to the global arguments? So would just global_arguments.skip_monitoring work? I don't even think you need the False here because action='store_true' above defaults to False if not specified.

Why `getattr()`? Doesn't the attribute always exist now that you've added it to the global arguments? So would just `global_arguments.skip_monitoring` work? I don't even think you need the `False` here because `action='store_true'` above defaults to False if not specified.
action_names = [
action for action in arguments.keys() if action != 'global' and isinstance(action, str)
jetchirag marked this conversation as resolved Outdated

On the isinstance() call: Shouldn't they always be strings? Or is something else sneaking in there?

On the `isinstance()` call: Shouldn't they always be strings? Or is something else sneaking in there?
]
try:
local_borg_version = borg_version.local_borg_version(storage, local_path)
@ -90,17 +87,18 @@ def run_configuration(config_filename, config, arguments):
monitoring_log_level,
global_arguments.dry_run,
)
if using_primary_action and not skip_monitoring:
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.START,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
for action_name in action_names:
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.START,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
Review

I know we've going back and forth on the subject of looping over ping_monitor calls, but something occurs to me: Are there cases where this will ping the monitor multiple times even if it's only configured with a single ping_url option? I'm thinking of this code in normalize.py:

        config['hooks']['cronitor'] = {
            'create': cronitor['ping_url'],
            'prune': cronitor['ping_url'],
            'compact': cronitor['ping_url'],
            'check': cronitor['ping_url'],
        }

I think that means with the actions loop in place and ping_url set, Cronitor will get pinged four times for a standard borgmatic run. But shouldn't it ideally only get pinged once on startโ€”unless the user explicitly configures ping endpoints for multiple actions?

So, the question is what to do about this. One idea is to remove the above normalization code so that you can retain the prior behavior when ping_url is set: Only ping once on start. Another idea is to drop support for ping_url entirely. There are probably other options as well. I'm open to discussion on this point of course! I didn't realize this would be such a thorny ticket.

I know we've going back and forth on the subject of looping over `ping_monitor` calls, but something occurs to me: Are there cases where this will ping the monitor multiple times even if it's only configured with a single `ping_url` option? I'm thinking of this code in `normalize.py`: ``` config['hooks']['cronitor'] = { 'create': cronitor['ping_url'], 'prune': cronitor['ping_url'], 'compact': cronitor['ping_url'], 'check': cronitor['ping_url'], } ``` I think that means with the actions loop in place and `ping_url` set, Cronitor will get pinged *four* times for a standard borgmatic run. But shouldn't it ideally only get pinged *once* on startโ€”unless the user explicitly configures ping endpoints for multiple actions? So, the question is what to do about this. One idea is to remove the above normalization code so that you can retain the prior behavior when `ping_url` is set: Only ping *once* on start. Another idea is to drop support for `ping_url` entirely. There are probably other options as well. I'm open to discussion on this point of course! I didn't realize this would be such a thorny ticket.
Review

@witten I had this concern too since I started working on this. I was also wondering if someone wants to receive notification for a failed event in any of 4 actions, he will have to set the endpoints in each of them and same for future update.

And suppose someone wants to get notified for failure in all of these actions, it will send alert 4 times if all of them fail, where probability of other act

So what if we keep the ping_url but add a boolean configuration so user can select which action to receive notification on. I don't think people will use different cronitor endpoint for different actions of same job.

This should still fulfil the problem raised in this PR's issue.

It's been a while since I worked on this so I might be misunderstanding something here. So these are just the thoughts I gathered going over the code again. Let me know what you think.

@witten I had this concern too since I started working on this. I was also wondering if someone wants to receive notification for a failed event in any of 4 actions, he will have to set the endpoints in each of them and same for future update. And suppose someone wants to get notified for failure in all of these actions, it will send alert 4 times if all of them fail, where probability of other act So what if we keep the ping_url but add a boolean configuration so user can select which action to receive notification on. I don't think people will use different cronitor endpoint for different actions of same job. This should still fulfil the problem raised in this PR's issue. It's been a while since I worked on this so I might be misunderstanding something here. So these are just the thoughts I gathered going over the code again. Let me know what you think.
Review

It's also been a while since I've looked at this code! Could you maybe provide a configuration example to illustrate what you're thinking here? For instance, I'm unsure how an additional boolean configuration option would allow a user to select which borgmatic action to receive notifications on. I would think that any action with a Cronitor URL set would receive a notification. Or is the distinction you're drawing that, for a given action, URLs might only be notified for some of begin/start/fail and not all three?

I don't think people will use different cronitor endpoint for different actions of same job.

Looking at #518, I think a common case is probably where the user wants one URL for, say, the create action and a completely different URL if any for, say, the check action. (See #366 as well.) In that case it wouldn't be a problem that each such URL would get pinged once upon failure, as each action is hitting a different Cronitor endpoint.

For the current (released borgmatic) use case of pinging Cronitor only once for begin/start/fail when the action is create, prune, compact, or check.. What do you think of including some logic such that if a URL is specified for multiple different actions (whether explicitly.. or implicitly via ping_url), then borgmatic is smart enough to only ping Cronitor once before/after all the actions and once on failure.

I've given no thought to how that might be implemented, but such de-duplication seems nice from a user perspectiveโ€”and it lines up with the currently released behavior. Let me know though if this doesn't sound feasible or desirable.

It's also been a while since I've looked at this code! Could you maybe provide a configuration example to illustrate what you're thinking here? For instance, I'm unsure how an additional boolean configuration option would allow a user to select which borgmatic action to receive notifications on. I would think that any action with a Cronitor URL set would receive a notification. Or is the distinction you're drawing that, for a given action, URLs might only be notified for some of begin/start/fail and not all three? > I don't think people will use different cronitor endpoint for different actions of same job. Looking at #518, I think a common case *is* probably where the user wants one URL for, say, the `create` action and a completely different URL if any for, say, the `check` action. (See #366 as well.) In that case it wouldn't be a problem that each such URL would get pinged once upon failure, as each action is hitting a different Cronitor endpoint. For the current (released borgmatic) use case of pinging Cronitor only once for begin/start/fail when the action is create, prune, compact, or check.. What do you think of including some logic such that if a URL is specified for multiple different actions (whether explicitly.. or implicitly via `ping_url`), then borgmatic is smart enough to only ping Cronitor once before/after all the actions and once on failure. I've given no thought to how that might be implemented, but such de-duplication seems nice from a user perspectiveโ€”and it lines up with the currently released behavior. Let me know though if this doesn't sound feasible or desirable.
Review

It makes sense to include logic to remove duplicate endpoints. I'll add that.

I've fixed the latest merge conflicts and will start working on the rest of suggestions. Development might still be slow unfortunately since classes are starting but I'll keep at it. Since merge conflicts are fixed, I can solve the rest in small pieces.

Thank you for the help and patience!

It makes sense to include logic to remove duplicate endpoints. I'll add that. I've fixed the latest merge conflicts and will start working on the rest of suggestions. Development might still be slow unfortunately since classes are starting but I'll keep at it. Since merge conflicts are fixed, I can solve the rest in small pieces. Thank you for the help and patience!
except (OSError, CalledProcessError) as error:
if command.considered_soft_failure(config_filename, error):
return
@ -111,7 +109,9 @@ def run_configuration(config_filename, config, arguments):
if not encountered_error:
repo_queue = Queue()
for repo in location['repositories']:
repo_queue.put((repo, 0),)
repo_queue.put(
(repo, 0),
)
while not repo_queue.empty():
repository_path, retry_num = repo_queue.get()
@ -135,7 +135,9 @@ def run_configuration(config_filename, config, arguments):
)
except (OSError, CalledProcessError, ValueError) as error:
if retry_num < retries:
repo_queue.put((repository_path, retry_num + 1),)
repo_queue.put(
(repository_path, retry_num + 1),
)
tuple( # Consume the generator so as to trigger logging.
log_error_records(
f'{repository_path}: Error running actions for repository',
@ -161,16 +163,17 @@ def run_configuration(config_filename, config, arguments):
try:
if using_primary_action and not skip_monitoring:
# send logs irrespective of error
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.LOG,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
for action_name in action_names:
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.LOG,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
except (OSError, CalledProcessError) as error:
if command.considered_soft_failure(config_filename, error):
return
@ -181,16 +184,17 @@ def run_configuration(config_filename, config, arguments):
if not encountered_error:
try:
if using_primary_action and not skip_monitoring:
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.FINISH,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
for action_name in action_names:
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.FINISH,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
dispatch.call_hooks(
'destroy_monitor',
hooks,
@ -218,16 +222,17 @@ def run_configuration(config_filename, config, arguments):
error=encountered_error,
output=getattr(encountered_error, 'output', ''),
)
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.FAIL,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
for action_name in action_names:
dispatch.call_hooks(
'ping_monitor',
hooks,
config_filename,
monitor.MONITOR_HOOK_NAMES,
monitor.State.FAIL,
monitoring_log_level,
global_arguments.dry_run,
action_name,
)
dispatch.call_hooks(
'destroy_monitor',
hooks,
@ -417,19 +422,39 @@ def run_actions(
)
elif action_name == 'rlist':
yield from borgmatic.actions.rlist.run_rlist(
repository, storage, local_borg_version, action_arguments, local_path, remote_path,
repository,
storage,
local_borg_version,
action_arguments,
local_path,
remote_path,
)
elif action_name == 'list':
yield from borgmatic.actions.list.run_list(
repository, storage, local_borg_version, action_arguments, local_path, remote_path,
repository,
storage,
local_borg_version,
action_arguments,
local_path,
remote_path,
)
elif action_name == 'rinfo':
yield from borgmatic.actions.rinfo.run_rinfo(
repository, storage, local_borg_version, action_arguments, local_path, remote_path,
repository,
storage,
local_borg_version,
action_arguments,
local_path,
remote_path,
)
elif action_name == 'info':
yield from borgmatic.actions.info.run_info(
repository, storage, local_borg_version, action_arguments, local_path, remote_path,
repository,
storage,
local_borg_version,
action_arguments,
local_path,
remote_path,
)
elif action_name == 'break-lock':
borgmatic.actions.break_lock.run_break_lock(
@ -442,7 +467,12 @@ def run_actions(
)
elif action_name == 'borg':
borgmatic.actions.borg.run_borg(
repository, storage, local_borg_version, action_arguments, local_path, remote_path,
repository,
storage,
local_borg_version,
action_arguments,
local_path,
remote_path,
)
command.execute_hook(
@ -635,7 +665,8 @@ def collect_configuration_run_summary_logs(configs, arguments):
logger.info(f"Unmounting mount point {arguments['umount'].mount_point}")
try:
borg_umount.unmount_archive(
mount_point=arguments['umount'].mount_point, local_path=get_local_path(configs),
mount_point=arguments['umount'].mount_point,
local_path=get_local_path(configs),
)
except (CalledProcessError, OSError) as error:
yield from log_error_records('Error unmounting mount point', error)

View File

@ -35,9 +35,11 @@ def ping_monitor(hook_config, config_filename, state, monitoring_log_level, dry_
dry_run_label = ' (dry run; not actually pinging)' if dry_run else ''
try:
ping_url = f"{hook_config['sss']}/{MONITOR_STATE_TO_CRONITOR[state]}"
ping_url = f"{hook_config[action_name]}/{MONITOR_STATE_TO_CRONITOR[state]}"
except KeyError:
jetchirag marked this conversation as resolved
Review

Nice, succinct way to skip unconfigured actions. You could optionally logger.debug() in the except so that it's clear what's being skipped and why.

Nice, succinct way to skip unconfigured actions. You could optionally `logger.debug()` in the `except` so that it's clear what's being skipped and why.
logger.debug(f'{config_filename}: Skipping Cronitor {state.name.lower()} ping due to unconfigured action: {action_name}')
logger.debug(
jetchirag marked this conversation as resolved Outdated
Outdated
Review

Missed a debug log here ๐Ÿ˜„

Missed a debug log here ๐Ÿ˜„

Thanks! :)

Thanks! :)
f'{config_filename}: Skipping Cronitor {state.name.lower()} ping due to unconfigured action: {action_name}'
)
return
logger.info(f'{config_filename}: Pinging Cronitor {state.name.lower()}{dry_run_label}')