Implement json output for create_archive and pretty print json output. #97

Closed
floli wants to merge 2 commits from (deleted):master into main
Collaborator
No description provided.
witten reviewed 2018-09-30 18:32:47 +00:00
@ -158,3 +158,3 @@
logger.debug(' '.join(full_command))
subprocess.check_call(full_command)
output = subprocess.check_output(full_command)
Owner

Does this approach (check_output() instead of check_call()) have the side effect of delaying any borg create output until the command fully runs and exits? Do you see that as a usability problem, or not something to worry about? If you want to address this, one idea is to conditionally run check_call() or check_output() based on whether args.json is True.

Note that I don't feel strongly about this either way.. Just wanted to bring it up and get your take.

Does this approach (`check_output()` instead of `check_call()`) have the side effect of delaying any `borg create` output until the command fully runs and exits? Do you see that as a usability problem, or not something to worry about? If you want to address this, one idea is to conditionally run `check_call()` or `check_output()` based on whether `args.json` is `True`. Note that I don't feel strongly about this either way.. Just wanted to bring it up and get your take.
@ -159,2 +159,3 @@
logger.debug(' '.join(full_command))
subprocess.check_call(full_command)
output = subprocess.check_output(full_command)
return output.decode() if output is not None else None
Owner

Instead of conditionally decoding after the fact, I think you should be able to pass the desired encoding directly to check_output(). Example:

     output = subprocess.check_output(full_command, encoding=sys.stdout.encoding)
Instead of conditionally decoding after the fact, I think you should be able to pass the desired encoding directly to `check_output()`. Example: ``` output = subprocess.check_output(full_command, encoding=sys.stdout.encoding) ```
@ -187,0 +188,4 @@
if args.json:
json_results.append(json.loads(output))
else:
sys.stdout.write(output)
Owner

Given the repetition of this pattern a few times in this function, you could DRY this up by factoring it out into a common function. Example imagined invocation:

display_output(output, args.json)
Given the repetition of this pattern a few times in this function, you could DRY this up by factoring it out into a common function. Example imagined invocation: ``` display_output(output, args.json) ````
Owner

Thanks for making this fix. The general approach looks good to me, although see my note about potential delayed Borg output.

Additionally, it would be great to have some automated testing around the new behavior if possible. It looks like currently tests aren't passing on this fork, so it's possible that in fixing them, that will implicitly test this new behavior.

Thanks for making this fix. The general approach looks good to me, although see my note about potential delayed Borg output. Additionally, it would be great to have some automated testing around the new behavior if possible. It looks like currently tests aren't passing on this fork, so it's possible that in fixing them, that will implicitly test this new behavior.
Author
Collaborator

I was sure tests passed, when I commited it. But obviously, ...

  • Tests: I had added one test in 9aaf78b9dd.

  • Delayed Output: You are right, this is a side effect I haven't considered. I think this is also what makes the tests break. What we actually want is some tee-like functionality: print output unbuffered to stdout and simultaneously to a file-like object. Btw, what is the rationale behind capturing the output when json=True and not just leave stdout of borg connected to the tty, like it's done when json=False.

I was sure tests passed, when I commited it. But obviously, ... * Tests: I had added one test in 9aaf78b9dd. * Delayed Output: You are right, this is a side effect I haven't considered. I think this is also what makes the tests break. What we actually want is some `tee`-like functionality: print output unbuffered to stdout and simultaneously to a file-like object. Btw, what is the rationale behind capturing the output when json=True and not just leave stdout of borg connected to the tty, like it's done when json=False.
Owner

FYI: I just enabled continuous integration on this repository, so in theory future pull requests should get tests run automatically!

Btw, what is the rationale behind capturing the output when json=True and not just leave stdout of borg connected to the tty, like it’s done when json=False.

I think that would actually be preferable, if possible. Feel free to change it if you like! The trick is in also figuring out how to capture the JSON, tee-like, as you put it. Maybe something along these lines?

FYI: I just enabled continuous integration on this repository, so in theory future pull requests should get tests run automatically! > Btw, what is the rationale behind capturing the output when json=True and not just leave stdout of borg connected to the tty, like it’s done when json=False. I think that would actually be preferable, if possible. Feel free to change it if you like! The trick is in also figuring out how to capture the JSON, `tee`-like, as you put it. Maybe something [along these lines](https://stackoverflow.com/questions/616645/how-do-i-duplicate-sys-stdout-to-a-log-file-in-python#answer-616672)?
Author
Collaborator

Currently, all functions that accept a json argument catch the output and we append the output to a list of json objects:

if args.json:
            json_results.append(json.loads(output))
        else:
            sys.stdout.write(output)

which is then dumped altogether.

Just as an idea: why not just give borg the --json parameter and let it directly write to stdout? Would this result in an invalid json output? Therewith no result needs output capturing anymore.

test_run_commands_handles_multiple_json_outputs_in_array tests this.

I did a quick commit to https://projects.torsion.org/floli/borgmatic/src/branch/json_to_stdout to show what I mean.

Currently, all functions that accept a `json` argument catch the output and we append the output to a list of json objects: ``` if args.json: json_results.append(json.loads(output)) else: sys.stdout.write(output) ``` which is then dumped altogether. Just as an idea: why not just give borg the `--json` parameter and let it directly write to stdout? Would this result in an invalid json output? Therewith no result needs output capturing anymore. `test_run_commands_handles_multiple_json_outputs_in_array` tests this. I did a quick commit to https://projects.torsion.org/floli/borgmatic/src/branch/json_to_stdout to show what I mean.
Owner

If I recall correctly, the main reason for capturing Borg's JSON output and putting it in a JSON list is for the (fairly common) case of multiple repositories in a borgmatic configuration file or multiple separate borgmatic configuration files. The idea is that in a single borgmatic run, borgmatic may be invoking borg --json multiple times, and therefore need to gather the results into a common JSON list for ease of parsing by any downstream consumers of borgmatic output.

We could switch to outputting them as separate JSON blobs, one per repository. But that would have the effect of making the whole output of a borgmatic run more difficult to parse.

A related idea is to leave Borg outputting JSON directly to stdout as per your branch, but then try to post-hoc "wrap" that JSON spew into a list. For instance, for a borgmatic config file containing two repositories: print "[", then the result of borg --json on the first repository, then print ",", then the result of borg --json on the second repository, then "]".

Let me know your thoughts.

If I recall correctly, the main reason for capturing Borg's JSON output and putting it in a JSON list is for the (fairly common) case of multiple repositories in a borgmatic configuration file or multiple separate borgmatic configuration files. The idea is that in a single borgmatic run, borgmatic may be invoking `borg --json` multiple times, and therefore need to gather the results into a common JSON list for ease of parsing by any downstream consumers of borgmatic output. We could switch to outputting them as separate JSON blobs, one per repository. But that would have the effect of making the whole output of a borgmatic run more difficult to parse. A related idea is to leave Borg outputting JSON directly to stdout as per your branch, but then try to post-hoc "wrap" that JSON spew into a list. For instance, for a borgmatic config file containing two repositories: print "[", then the result of `borg --json` on the first repository, then print ",", then the result of `borg --json` on the second repository, then "]". Let me know your thoughts.
Owner

Also, not necessarily a deal-breaker for the never-capture-Borg-output approach, but implementing #86 would probably require post-processing Borg JSON output in some way.. That is, unless it's implemented as a separate tool that's designed for post-processing direct Borg output, e.g.:

borgmatic --list | print-only-the-last-successful-archive

In that case, borgmatic could still pass through Borg stdout unmolested, and let the piped command deal with parsing it.

Also, not necessarily a deal-breaker for the never-capture-Borg-output approach, but implementing #86 would *probably* require post-processing Borg JSON output in some way.. That is, unless it's implemented as a separate tool that's designed for post-processing direct Borg output, e.g.: ```bash borgmatic --list | print-only-the-last-successful-archive ``` In that case, borgmatic could still pass through Borg stdout unmolested, and let the piped command deal with parsing it.
Author
Collaborator

Sorry, for the long absence...

Ok, as far as I see, there are three options:

  1. Never capture JSON output. Then we might produce ill-formed JSON when multiple commands are executed.

  2. Capture JSON output. That delays output until the borg command has finished.

  3. Implement the tee-like redirection you posted above. I think, I had plans doing that in one of my own projects, too, but refrained from doing so. Not sure why though.

  4. and 2. are the easiest and require almost no changes. I tend to lean towards 2., since when JSON is activated, borgmatic is probably not used interactively anyway. We also should try to produce valid JSON.

Anyway, this PR could be closed (without merging). If 2. is ok with you, I will have a look on what needs to be implemented and create another PR.

Best!

Sorry, for the long absence... Ok, as far as I see, there are three options: 1. Never capture JSON output. Then we might produce ill-formed JSON when multiple commands are executed. 2. Capture JSON output. That delays output until the borg command has finished. 3. Implement the tee-like redirection you posted above. I think, I had plans doing that in one of my own projects, too, but refrained from doing so. Not sure why though. 1. and 2. are the easiest and require almost no changes. I tend to lean towards 2., since when JSON is activated, borgmatic is probably not used interactively anyway. We also should try to produce valid JSON. Anyway, this PR could be closed (without merging). If 2. is ok with you, I will have a look on what needs to be implemented and create another PR. Best!
floli closed this pull request 2018-10-27 13:50:33 +00:00
Owner

That sounds great to me.. Your rationale for 2 makes sense.

That sounds great to me.. Your rationale for 2 makes sense.
Author
Collaborator

Just a quick question: From https://projects.torsion.org/witten/borgmatic/src/branch/master/test_requirements.txt I understand that the minimal python version is 3.6? So it is ok to use subprocess.run? (introduced in 3.5)

Just a quick question: From https://projects.torsion.org/witten/borgmatic/src/branch/master/test_requirements.txt I understand that the minimal python version is 3.6? So it is ok to use `subprocess.run`? (introduced in 3.5)
Owner

Actually, those test requirements are indicating that the black code formatter (only) is Python 3.6+. Everything else should currently work in Python 3+. Having said that though, I'm fine if you use subprocess.run() and require Python 3.5+. After all, it's nearly 2019! (Might be good to update the docs and/or mention it in the NEWS though.)

Actually, those test requirements are indicating that the [black code formatter](https://github.com/ambv/black) (only) is Python 3.6+. Everything else should currently work in Python 3+. Having said that though, I'm fine if you use `subprocess.run()` and require Python 3.5+. After all, it's nearly 2019! (Might be good to update the docs and/or mention it in the NEWS though.)
Author
Collaborator

I also think Python 3.5 should be fine, even Debian Stable has it.

If we went for very new python, we could use:

    return subprocess.run(full_command, capture_output = json, check=True, text=True)

however, I think we should retain compatibility to Debian Stable.

Currently, my version looks like:

    if json:
        return subprocess.check_output(full_command).decode()
    else:
        subprocess.check_call(full_command)
        return None

and is called like this

    if args.list:
        logger.info('{}: Listing archives'.format(repository))
        output = borg_list.list_archives(
            repository, storage, local_path=local_path, remote_path=remote_path, json=args.json
        )
        if args.json:
            json_results.append(json.loads(output))

I think the structure makes it a little bit more straightforward and also make the list command not first capture then output, but let borg output right away.

This works just fine, as far as I can tell, with one problem.

The tests need to become more complex, because for json=True we need to install a subprocess mock that expects check_output, otherwise it got to expect check_call

Options I see

  • Make the subprocess mock expects either command.
  • Use two different mocks

I am not experienced with the testing framework. What do you think?

I also think Python 3.5 should be fine, even Debian Stable has it. If we went for very new python, we could use: ``` return subprocess.run(full_command, capture_output = json, check=True, text=True) ``` however, I think we should retain compatibility to Debian Stable. Currently, my version looks like: ``` if json: return subprocess.check_output(full_command).decode() else: subprocess.check_call(full_command) return None ``` and is called like this ``` if args.list: logger.info('{}: Listing archives'.format(repository)) output = borg_list.list_archives( repository, storage, local_path=local_path, remote_path=remote_path, json=args.json ) if args.json: json_results.append(json.loads(output)) ``` I think the structure makes it a little bit more straightforward and also make the list command not first capture then output, but let borg output right away. This works just fine, as far as I can tell, with one problem. The tests need to become more complex, because for `json=True` we need to install a subprocess mock that expects `check_output`, otherwise it got to expect `check_call` Options I see * Make the subprocess mock expects either command. * Use two different mocks I am not experienced with the testing framework. What do you think?
Owner

A third option: Wrap the check_*() calls into a separate function, and then mock that out. Example:

def run_command(full_command, json=False):
    if json:
        return subprocess.check_output(full_command).decode()

    subprocess.check_call(full_command)

Mocking that out would be pretty easy, and then you wouldn't have to think about the different code paths for json values all over your tests. It would also have the additional benefit of DRYing up some code that I expect would be repeated in several different places.

And I agree about Python 3.5.

A third option: Wrap the `check_*()` calls into a separate function, and then mock *that* out. Example: ``` def run_command(full_command, json=False): if json: return subprocess.check_output(full_command).decode() subprocess.check_call(full_command) ``` Mocking that out would be pretty easy, and then you wouldn't have to think about the different code paths for `json` values all over your tests. It would also have the additional benefit of DRYing up some code that I expect would be repeated in several different places. And I agree about Python 3.5.
Owner

I just wanted to let you know that while this PR itself didn't get merged, I incorporated many of the ideas from it into borgmatic 1.3.1. That includes only capturing output when the --json flag is used, rather than all the time. Thanks again!

I just wanted to let you know that while this PR itself didn't get merged, I incorporated many of the ideas from it into borgmatic 1.3.1. That includes only capturing output when the `--json` flag is used, rather than all the time. Thanks again!
Author
Collaborator

Sorry for not further working on that, just not having time currently. Best!

Sorry for not further working on that, just not having time currently. Best!
Owner

No worries.. I know how that is!

No worries.. I know how that is!

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#97
No description provided.