Container database backups #1121

Merged
witten merged 19 commits from apollo13/borgmatic:ticket1116 into main 2025-09-17 23:02:58 +00:00
Contributor

Initial PR without tests to gather design feedback

Initial PR without tests to gather design feedback
add support for container names/id when dumping databases.
Some checks failed
build / test (pull_request) Failing after 8m10s
build / docs (pull_request) Has been skipped
57ba7c49d7
witten left a comment
Owner

Overall this looks like a solid approach to me. Thanks for taking it on!

What I did for now is adding label: which is a no-brainer. I then opted for container: instead of an interpolation because the reuse potential seems low and an interpolation might want to allow reading labels etc in the future so we should brainstorm it's design first (like {container label=x container_name} might return the label value of label x or similar…

Okay, got it.

I also opted to "shell out" and use the docker/podman CLIs. In theory I could call the relevant API endpoints directly but that would:

This makes total sense to me. In many ways, borgmatic is a glorified shell script written in Python, just shelling out to external executables as needed. That's often more flexible than tight API integration.

Some things I think are probably missing from this PR though:

  • The requisite changes to the borgmatic restore action. E.g. --label/--container/--original-label/--original-container.
  • The requisite changes to the restore_* configuration schema, e.g. restore_container.
Overall this looks like a solid approach to me. Thanks for taking it on! > What I did for now is adding label: which is a no-brainer. I then opted for container: instead of an interpolation because the reuse potential seems low and an interpolation might want to allow reading labels etc in the future so we should brainstorm it's design first (like {container label=x container_name} might return the label value of label x or similar… Okay, got it. > I also opted to "shell out" and use the docker/podman CLIs. In theory I could call the relevant API endpoints directly but that would: This makes total sense to me. In many ways, borgmatic is a glorified shell script written in Python, just shelling out to external executables as needed. That's often more flexible than tight API integration. Some things I think are probably missing from this PR though: * The requisite changes to the `borgmatic restore` action. E.g. `--label`/`--container`/`--original-label`/`--original-container`. * The requisite changes to the `restore_*` configuration schema, e.g. `restore_container`.
@ -13,3 +13,3 @@
execute_command_with_processes,
)
from borgmatic.hooks.data_source import dump
from borgmatic.hooks.data_source import dump, utils
Owner

Sort of a nit at the design review level, but: Just FYI I'm personally allergic to any file/module called utils, because I've seen how much such a file can fill up with unrelated cruft over time. I suggest a name that's more descriptive of its purpose (what the contained functions have in common).

Sort of a nit at the design review level, but: Just FYI I'm personally allergic to any file/module called `utils`, because I've seen how much such a file can fill up with unrelated cruft over time. I suggest a name that's more descriptive of its purpose (what the contained functions have in common).
Author
Contributor

Open to ideas, I am bad at naming :)

Open to ideas, I am bad at naming :)
Owner

So am I. How about just container.py? It'll maybe be a conflict if and when someone goes to implement #685 or #671, but we can deal with that then.

Alternative if you don't like that: hostname.py or address.py.

So am I. How about just `container.py`? It'll maybe be a conflict if and when someone goes to implement #685 or #671, but we can deal with that then. Alternative if you don't like that: `hostname.py` or `address.py`.
Author
Contributor

After the most recent changes utils basically does two things:

  • container stuff, ie get the ip for a container name/id
  • handle repetitive logic for determining port/username/password etc from a data_source

Personally I think they both belong into different categories and as such probably different files. Which seems overkill for those two functions so I'd suggest one of the following names instead: config.py / config_helpers.py / config_utils.py. The idea behind the naming being that the functions (as currently used) all deal which the configuration aspects of a data source.

After the most recent changes utils basically does two things: * container stuff, ie get the ip for a container name/id * handle repetitive logic for determining port/username/password etc from a data_source Personally I think they both belong into different categories and as such probably different files. Which seems overkill for those two functions so I'd suggest one of the following names instead: `config.py` / `config_helpers.py` / `config_utils.py`. The idea behind the naming being that the functions (as currently used) all deal which the configuration aspects of a data source.
Owner

Got it. How about just config.py for now?

Got it. How about just `config.py` for now?
apollo13 marked this conversation as resolved
@ -0,0 +10,4 @@
def get_hostname_from_config(database):
if 'container' in database:
return get_ip_from_container(database['container'])
return database.get('hostname', '')
Owner

Worth putting the "localhost" default here?

Worth putting the "localhost" default here?
Author
Contributor

I do not think so, because then it will always count as explicitly specified to be localhost and the other checks checking whether the hostname is set or not will fail.

I do not think so, because then it will always count as explicitly specified to be `localhost` and the other checks checking whether the hostname is set or not will fail.
Owner

Makes sense (as long as the "localhost" default is applied elsewhere).

Makes sense (as long as the "localhost" default is applied elsewhere).
witten marked this conversation as resolved
@ -0,0 +14,4 @@
def get_ip_from_container(container):
engines = (shutil.which(engine) for engine in ('docker', 'podman'))
Owner

TIL I learned about shutil.which()!

I'm not sure this 100% works though, because it might be legitimate to have both docker and podman executables installed. Maybe that's okay though since you're effectively probing for which one can successfully execute container inspect below?

At some point it might be necessary for there to be borgmatic options to configure the container engine, but maybe we can get away without that right now.

TIL I learned about `shutil.which()`! I'm not sure this 100% works though, because it might be legitimate to have both `docker` and `podman` executables installed. Maybe that's okay though since you're effectively probing for which one can successfully execute `container inspect` below? At some point it might be necessary for there to be borgmatic options to configure the container engine, but maybe we can get away without that right now.
Author
Contributor

Yes that was my thinking. I expect it to be rather uncommon to have both installed so I went with trying both and continuing if no container is found in one…

Regarding options for the container engine: Yeah maybe, but I didn't know where to put it, in theory you could have podman and docker running in parallel and backup containers from either, but adding a container_engine to the database config seems rather silly.

Yes that was my thinking. I expect it to be rather uncommon to have both installed so I went with trying both and continuing if no container is found in one… Regarding options for the container engine: Yeah maybe, but I didn't know where to put it, in theory you could have podman and docker running in parallel and backup containers from either, but adding a `container_engine` to the database config seems rather silly.
Owner

Yes that was my thinking. I expect it to be rather uncommon to have both installed so I went with trying both and continuing if no container is found in one…

Fair enough. BTW I think I have an uncommon setup. 😄

$ which docker podman
/usr/bin/docker
/usr/bin/podman

Regarding options for the container engine: Yeah maybe, but I didn't know where to put it, in theory you could have podman and docker running in parallel and backup containers from either, but adding a container_engine to the database config seems rather silly.

Yeah, I don't think we need it now. But if/when we need it, it would probably make most sense at the global scope rather than as part of the database config.

> Yes that was my thinking. I expect it to be rather uncommon to have both installed so I went with trying both and continuing if no container is found in one… Fair enough. BTW I think I have an uncommon setup. 😄 ``` $ which docker podman /usr/bin/docker /usr/bin/podman ``` > Regarding options for the container engine: Yeah maybe, but I didn't know where to put it, in theory you could have podman and docker running in parallel and backup containers from either, but adding a container_engine to the database config seems rather silly. Yeah, I don't think we need it now. But if/when we need it, it would probably make most sense at the global scope rather than as part of the database config.
witten marked this conversation as resolved
@ -0,0 +18,4 @@
engines = [engine for engine in engines if engine]
if not engines:
raise xxx # TODO: What to raise here, tell the user to install docker/podman
Owner

borgmatic abuses CalledProcessError, ValueError, and OSError for practically everything. (Rationale: I think custom exception types for every conceivable situation are kind of silly. Although one could argue borgmatic goes too far to the other extreme.)

Any one of those exception types, once raised, will surface a nice message to the user.

borgmatic abuses `CalledProcessError`, `ValueError`, and `OSError` for practically everything. (Rationale: I think custom exception types for every conceivable situation are kind of silly. Although one could argue borgmatic goes too far to the other extreme.) Any one of those exception types, once raised, will surface a nice message to the user.
apollo13 marked this conversation as resolved
@ -0,0 +32,4 @@
)
)
except subprocess.CalledProcessError:
continue # Container does not exist
Owner

Worth logging debug?

Worth logging debug?
apollo13 marked this conversation as resolved
@ -0,0 +34,4 @@
except subprocess.CalledProcessError:
continue # Container does not exist
network_data = json.loads(output.strip())
Owner

Should probably catch JSON decode errors here.

Should probably catch JSON decode errors here.
Author
Contributor

In theory yes, but in practice this command should really output json if it doesn't fail (the --format explicitly asks for it). If it doesn't borgmatic should blow up anyways and we have to figure out why…

In theory yes, but in practice this command should really output json if it doesn't fail (the --format explicitly asks for it). If it doesn't borgmatic should blow up anyways and we have to figure out why…
Owner

I guess I'm just the type of programmer that looks both ways when crossing a one-way street. 😄

But one contrived situation I can think of that could cause this to trigger would be for someone to make a docker or podman wrapper script that does something funky with the JSON returned by the actual docker or podman. Stranger things have happened... For instance, users regularly write borgmatic-specific wrapper scripts used when shelling out to btrfs or other executables because they want to customize what gets fed to borgmatic.

Anyway...

I guess I'm just the type of programmer that looks both ways when crossing a one-way street. 😄 But one contrived situation I can think of that could cause this to trigger would be for someone to make a `docker` or `podman` wrapper script that does something funky with the JSON returned by the actual `docker` or `podman`. Stranger things have happened... For instance, users regularly write borgmatic-specific wrapper scripts used when shelling out to `btrfs` or other executables because they want to customize what gets fed to borgmatic. Anyway...
Owner

It occurs to me.. I think I just figured out the likely first enhancement request for this feature. In my experience with other borgmatic features, as soon as you make borgmatic shell out to a named executable, tickets start rolling in to make that executable path customizable. (Hence ssh_command, pg_dump_command, zfs_command, etc. etc.)

I'm not suggesting you have to do that as part of this PR, but just be aware it may be an incoming feature request soon.

It occurs to me.. I think I just figured out the likely first enhancement request for this feature. In my experience with other borgmatic features, as soon as you make borgmatic shell out to a named executable, tickets start rolling in to make that executable path customizable. (Hence `ssh_command`, `pg_dump_command`, `zfs_command`, etc. etc.) I'm not suggesting you have to do that as part of this PR, but just be aware it may be an incoming feature request soon.
Author
Contributor

Okay, assuming I am going to catch a JSON decode error here. What would I do with it? I wouldn't want to skip it since it really is an error that should stop processing. And just rewrapping it into a ValueError and reraising for such an edge case (someone broke the API-contract provided by docker/podman) seems not worth it to me.

Happy to do it if you think it is useful (I am usually all for defensive coding) but I honestly don't see the benefit -- I rather prefer borgmatic crash with the original error (makes debugging easier imo).

Okay, assuming I am going to catch a JSON decode error here. What would I do with it? I wouldn't want to skip it since it really is an error that should stop processing. And just rewrapping it into a ValueError and reraising for such an edge case (someone broke the API-contract provided by docker/podman) seems not worth it to me. Happy to do it if you think it is useful (I am usually all for defensive coding) but I honestly don't see the benefit -- I rather prefer borgmatic crash with the original error (makes debugging easier imo).
Owner

I've long thought that borgmatic needs a --traceback flag that suppresses any error handling and just lets everything fail ungracefully for debugging purposes. Not suggesting you do that, but it would make this decision easier IMO if such a flag already existed. But yeah, even lacking that, I would re-raise as ValueError with a nice user-facing error message. Given a bug report containing that error message, I can always find exactly where it occurred (if not the entire call stack that a traceback would provide).

But if it doesn't seem worth it, I'm fine waiting until it is. 😄

I've long thought that borgmatic needs a `--traceback` flag that suppresses any error handling and just lets everything fail ungracefully for debugging purposes. Not suggesting you do that, but it would make this decision easier IMO if such a flag already existed. But yeah, even lacking that, I would re-raise as ValueError with a nice user-facing error message. Given a bug report containing that error message, I can always find exactly where it occurred (if not the entire call stack that a traceback would provide). But if it doesn't seem worth it, I'm fine waiting until it is. 😄
apollo13 marked this conversation as resolved
@ -0,0 +44,4 @@
if ip:
return ip
raise xxx # No container ip found, what to raise here
Owner

See above.

See above.
apollo13 marked this conversation as resolved
@ -31,2 +29,2 @@
(hostname or 'localhost') + ('' if port is None else f':{port}'),
name,
identifier = (
label if label else (hostname or 'localhost') + ('' if port is None else f':{port}')
Owner

One side effect I can think of is that borgmatic restore --original-hostname ... will no longer work for a database if label is set. Probably okay, but maybe should be documented somewhere so users aren't caught unawares.

One side effect I can think of is that `borgmatic restore --original-hostname ...` will no longer work for a database if `label` is set. Probably okay, but maybe should be documented somewhere so users aren't caught unawares.
Author
Contributor

To be honest all those original-* options feel a bit like overkill to me, personally I'd go with a single --original-id/--original-label/--original-folder (naming is hard) that let's you identify it. I mean if I am going to add --original-label as differentiator you can always set it to host:port to get back the same behavior as --original-host=host --original-port=port (if I am reading the code correctly)

To be honest all those `original-*` options feel a bit like overkill to me, personally I'd go with a single `--original-id`/`--original-label`/`--original-folder` (naming is hard) that let's you identify it. I mean if I am going to add `--original-label` as differentiator you can always set it to `host:port` to get back the same behavior as `--original-host=host --original-port=port` (if I am reading the code correctly)
Owner

I'm fine with a flag like that. It'd probably most clear to call it --original-label though since label is a user-facing concept and ID/folder isn't (yet). Even if that makes it slightly inaccurate.

I'm fine with a flag like that. It'd probably most clear to call it `--original-label` though since `label` is a user-facing concept and ID/folder isn't (yet). Even if that makes it slightly inaccurate.
apollo13 marked this conversation as resolved
Author
Contributor
  • The requisite changes to the borgmatic restore action. E.g. --label/--container/--original-label/--original-container.
  • The requisite changes to the restore_* configuration schema, e.g. restore_container.

Ahaha, and I thought I would get away with less work by not doing the interpolation stuff. But it shows that the current design is better (?) since with interpolation you couldn't easily restore into a container (well you can always explicitly specify the hostname, this would also allow restoral as of now).

> * The requisite changes to the `borgmatic restore` action. E.g. `--label`/`--container`/`--original-label`/`--original-container`. > * The requisite changes to the `restore_*` configuration schema, e.g. `restore_container`. Ahaha, and I thought I would get away with *less* work by not doing the interpolation stuff. But it shows that the current design is better (?) since with interpolation you couldn't easily restore into a container (well you can always explicitly specify the hostname, this would also allow restoral as of now).
Owner

Yeah, I guess that's a good argument for container being a first-class concept rather than interpolated bolt-on!

Yeah, I guess that's a good argument for container being a first-class concept rather than interpolated bolt-on!
Author
Contributor

I think I got the plumbing done, what misses most currently is manual testing and one or two integration/end-to-end tests. Do you have any recommendations for them and especially how to run them safely so they don't nuke my system? :D

EDIT:// I see they run inside a docker-compose, so they won't nuke my system. But now I would need docker in docker which might get ugly quickly…

I think I got the plumbing done, what misses most currently is manual testing and one or two integration/end-to-end tests. Do you have any recommendations for them and especially how to run them safely so they don't nuke my system? :D EDIT:// I see they run inside a docker-compose, so they won't nuke my system. But now I would need docker in docker which might get ugly quickly…
Owner

For what borgmatic calls "integration" tests: These are just tests that integrate multiple units (functions) and test the result. Usually two or three or four units at a time. As far as I'm concerned, these tests are often optional unless: 1. True unit tests for particular units would require a ridiculous amount of mocking, or 2. True unit tests wouldn't really test anything useful, and the logic under test is really at the multiple-unit level, and/or 3. There's a concern about individual unit tests not catching integration errors between functions, like the wrong arguments passed from one function to another.

In terms of the how for these tests, they should hopefully be even more straightforward than unit tests, as there should be little to no mocking needed. Just call a unit higher up in the call stack and make sure it does what it's supposed to.

For what borgmatic calls "end-to-end" tests: Yeah, these are the heavyweight tests that run inside Docker Compose and do things like spinning up actual database containers to test against, etc. These tests should be few and far between, just because they're so heavyweight.

I don't think you should need Docker-in-Docker though unless I'm missing something...? Everything is contained in the Compose containers that get spun up, so no system nuking should occur. Are you worried that Compose itself might misbehave and therefore you want to run Compose within its own container?

In terms of actual end-to-end tests for this feature, I could see appending to the existing configured borgmatic config in tests/end-to-end/hooks/data_source/test_database.py with a configured database that uses your fancy new container: configuration.

**For what borgmatic calls "integration" tests:** These are just tests that integrate multiple units (functions) and test the result. Usually two or three or four units at a time. As far as I'm concerned, these tests are often optional unless: 1. True unit tests for particular units would require a ridiculous amount of mocking, or 2. True unit tests wouldn't really test anything useful, and the logic under test is really at the multiple-unit level, and/or 3. There's a concern about individual unit tests not catching integration errors between functions, like the wrong arguments passed from one function to another. In terms of the _how_ for these tests, they should hopefully be even more straightforward than unit tests, as there should be little to no mocking needed. Just call a unit higher up in the call stack and make sure it does what it's supposed to. **For what borgmatic calls "end-to-end" tests:** Yeah, these are the heavyweight tests that run inside Docker Compose and do things like spinning up actual database containers to test against, etc. These tests should be few and far between, just because they're so heavyweight. I don't *think* you should need Docker-in-Docker though unless I'm missing something...? Everything is contained in the Compose containers that get spun up, so no system nuking should occur. Are you worried that Compose itself might misbehave and therefore you want to run Compose within its own container? In terms of actual end-to-end tests for this feature, I could see appending to the existing configured borgmatic config in `tests/end-to-end/hooks/data_source/test_database.py` with a configured database that uses your fancy new `container:` configuration.
Owner

Oh, I think I just figured out your Docker-in-Docker point: You're actually running docker and podman executables, and so the test code would need to do that too. Hmm...

Oh, I think I just figured out your Docker-in-Docker point: You're actually running `docker` and `podman` executables, and so the test code would need to do that too. Hmm...
Owner

Okay, it's less useful of a test, but you know what I recommend here? Don't run actual docker and podman executables in test. Instead, run fake executables that just return the expected JSON in the expected format... Which is much more feasible to actually run in a container.

This is exactly what's already done for the filesystem hook commands, because those can't easily run in a container either: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/tests/end-to-end/commands ... See here for an example usage of them: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/tests/end-to-end/hooks/data_source/test_zfs.py#L25

The main challenge I can see here is that you haven't yet implemented a way to override the docker or podman path... 😄

Okay, it's less useful of a test, but you know what I recommend here? Don't run actual `docker` and `podman` executables in test. Instead, run fake executables that just return the expected JSON in the expected format... Which is much more feasible to actually run in a container. This is exactly what's already done for the filesystem hook commands, because those can't easily run in a container either: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/tests/end-to-end/commands ... See here for an example usage of them: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/tests/end-to-end/hooks/data_source/test_zfs.py#L25 The main challenge I can see here is that you haven't yet implemented a way to override the `docker` or `podman` path... 😄
Author
Contributor

Thank you for the explanation of the tests. That did help a lot.

The main challenge I can see here is that you haven't yet implemented a way to override the docker or podman path... 😄

You ain't going to get it unless you force me. I am a strong believer in convention over configuration. Sure let's add it if there is real demand but I won't add it for now. As far as the tests are concerned: I won't need an option to change the path.

As a sidenote: You can actually already change the path. It is called PATH and works just fine thanks to the usage of shutil.which (sorry for the "tongue in cheek" wording, I just could not resist). So if it turns out that I would need a different binary for the tests I would just create a folder for the tests in question, but a podman/docker executable there and execute the test with the PATH set to that.

Does that eliminate your concerns about supporting a docker_command option for now? Granted, modifying the PATH to add your wrappers is a bit more work but not something that is uncommon or unheard of on *nix systems.

Thank you for the explanation of the tests. That did help a lot. > The main challenge I can see here is that you haven't yet implemented a way to override the docker or podman path... 😄 You ain't going to get it unless you force me. I am a strong believer in convention over configuration. Sure let's add it if there is real demand but I won't add it for now. As far as the tests are concerned: I won't need an option to change the path. As a sidenote: You can actually already change the path. It is called `PATH` and works just fine thanks to the usage of `shutil.which` (sorry for the "tongue in cheek" wording, I just could not resist). So if it turns out that I would need a different binary for the tests I would just create a folder for the tests in question, but a `podman`/`docker` executable there and execute the test with the `PATH` set to that. Does that eliminate your concerns about supporting a `docker_command` option for now? Granted, modifying the `PATH` to add your wrappers is a bit more work but not something that is uncommon or unheard of on *nix systems.
@ -1323,6 +1323,23 @@ properties:
implicitly enables read_special (see above) to support
dump and restore streaming.
example: users
label:
Author
Contributor

This got me thinking: There is so much duplication here, would you be open to use yaml anchors etc (assuming the loading code can handle it)?

This got me thinking: There is so much duplication here, would you be open to use yaml anchors etc (assuming the loading code can handle it)?
Owner

Sure, that'd be fine. There's actually tons of other schema duplication as well, so anchors might be a good thing to get used to here. But please make sure to test borgmatic config generate/config validate if you do make this change. I have some vague recollection that I tried this once and it broke something, but that might've been years ago and no longer relevant.

Sure, that'd be fine. There's actually tons of other schema duplication as well, so anchors might be a good thing to get used to here. But please make sure to test `borgmatic config generate`/`config validate` if you do make this change. I have some vague recollection that I tried this once and it broke something, but that might've been years ago and no longer relevant.
test fixes
Some checks failed
build / test (pull_request) Failing after 3m27s
build / docs (pull_request) Has been skipped
e35cca2cba
fix dump matching logic with labels.
All checks were successful
build / test (pull_request) Successful in 8m58s
build / docs (pull_request) Has been skipped
ec357f5c97
Author
Contributor

Okay, finally got to a state where I have initial end-to-end tests as well (at least for simple safe & restore). Mind looking at it and maybe provide some suggestions on how to keep the end to end tests small (countwise) since they take long. Would you mind if I adjust the config so half the databases test via hostname and the others via container?

Okay, finally got to a state where I have initial end-to-end tests as well (at least for simple safe & restore). Mind looking at it and maybe provide some suggestions on how to keep the end to end tests small (countwise) since they take long. Would you mind if I adjust the config so half the databases test via `hostname` and the others via `container`?
Owner

You ain't going to get it unless you force me. I am a strong believer in convention over configuration. Sure let's add it if there is real demand but I won't add it for now. As far as the tests are concerned: I won't need an option to change the path.

As a sidenote: You can actually already change the path. It is called PATH and works just fine thanks to the usage of shutil.which (sorry for the "tongue in cheek" wording, I just could not resist). So if it turns out that I would need a different binary for the tests I would just create a folder for the tests in question, but a podman/docker executable there and execute the test with the PATH set to that.

Does that eliminate your concerns about supporting a docker_command option for now? Granted, modifying the PATH to add your wrappers is a bit more work but not something that is uncommon or unheard of on *nix systems.

I'm totally fine with that for now. But if there's one constant with borgmatic users, it's that they want everything to be configurable in borgmatic's configuration file rather than, say, just command-line flags or environment variables. 😄

> You ain't going to get it unless you force me. I am a strong believer in convention over configuration. Sure let's add it if there is real demand but I won't add it for now. As far as the tests are concerned: I won't need an option to change the path. > > As a sidenote: You can actually already change the path. It is called PATH and works just fine thanks to the usage of shutil.which (sorry for the "tongue in cheek" wording, I just could not resist). So if it turns out that I would need a different binary for the tests I would just create a folder for the tests in question, but a podman/docker executable there and execute the test with the PATH set to that. > > Does that eliminate your concerns about supporting a docker_command option for now? Granted, modifying the PATH to add your wrappers is a bit more work but not something that is uncommon or unheard of on *nix systems. I'm totally fine with that for now. But if there's one constant with borgmatic users, it's that they want everything to be configurable in borgmatic's configuration file rather than, say, just command-line flags or environment variables. 😄
Owner

Okay, finally got to a state where I have initial end-to-end tests as well (at least for simple safe & restore). Mind looking at it and maybe provide some suggestions on how to keep the end to end tests small (countwise) since they take long. Would you mind if I adjust the config so half the databases test via hostname and the others via container?

I think it would be good to have both for each database type, but I don't think you need both for every database config. So for instance if you had the following, I think that would be sufficient:

  • hostname and container for PostgreSQL database "test" (but not for "all")
  • hostname and container for MariaDB database "test" (but not for "all")
  • etc.
> Okay, finally got to a state where I have initial end-to-end tests as well (at least for simple safe & restore). Mind looking at it and maybe provide some suggestions on how to keep the end to end tests small (countwise) since they take long. Would you mind if I adjust the config so half the databases test via hostname and the others via container? I think it would be good to have both for each database type, but I don't think you need both for *every database config*. So for instance if you had the following, I think that would be sufficient: * hostname and container for PostgreSQL database "test" (but not for "all") * hostname and container for MariaDB database "test" (but not for "all") * etc.
Owner

I think you can probably also omit the container variant entirely from write_custom_restore_configuration(). You're already testing a container restore with the previous e2e tests, so you shouldn't need to do it again here.

I think you can probably also omit the `container` variant entirely from `write_custom_restore_configuration()`. You're already testing a `container` restore with the previous e2e tests, so you shouldn't need to do it again here.
rename utils.py to config.py
All checks were successful
build / test (pull_request) Successful in 12m9s
build / docs (pull_request) Has been skipped
5387ee43bc
raise ValueError if json decode fails when getting the container ip
All checks were successful
build / test (pull_request) Successful in 7m28s
build / docs (pull_request) Has been skipped
0c7be55784
fix label handling and improve unittests
All checks were successful
build / test (pull_request) Successful in 7m16s
build / docs (pull_request) Has been skipped
8c1e67437c
Author
Contributor

Hi @witten, I improved a few tests since just asserting that a ValueError is raised doesn't really check if the correct ValueError is raised which made refactoring harder (due to all the mocking). I think I have the tests behaving now as they should.

I am struggling with one thing though. If I run something like:

uv run borgmatic -c /tmp/config restore --archive=12fcac07ecf3-2025-07-31T06:34:05.504527 --database=test --original-label=test1 --container asd -v3

to restore to container asd I get:

/tmp/config: No valid configuration files found

summary:
/tmp/config: Error parsing configuration file
Argument value "asd" is not of the expected type: object
/tmp/config: No valid configuration files found

Need some help? https://torsion.org/borgmatic/#issues

This is because the schema has a top-level configuration named container: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/config/schema.yaml#L2932 -- I am not sure how I could work around that, any ideas?

Hi @witten, I improved a few tests since just asserting that a `ValueError` is raised doesn't really check if the correct `ValueError` is raised which made refactoring harder (due to all the mocking). I think I have the tests behaving now as they should. I am struggling with one thing though. If I run something like: ``` uv run borgmatic -c /tmp/config restore --archive=12fcac07ecf3-2025-07-31T06:34:05.504527 --database=test --original-label=test1 --container asd -v3 ``` to restore to container `asd` I get: ``` /tmp/config: No valid configuration files found summary: /tmp/config: Error parsing configuration file Argument value "asd" is not of the expected type: object /tmp/config: No valid configuration files found Need some help? https://torsion.org/borgmatic/#issues ``` This is because the schema has a top-level configuration named `container`: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/config/schema.yaml#L2932 -- I am not sure how I could work around that, any ideas?
apollo13 force-pushed ticket1116 from 8c1e67437c
All checks were successful
build / test (pull_request) Successful in 7m16s
build / docs (pull_request) Has been skipped
to aaecd0e5b6
All checks were successful
build / test (pull_request) Successful in 11m57s
build / docs (pull_request) Has been skipped
2025-08-09 19:34:58 +00:00
Compare
apollo13 force-pushed ticket1116 from aaecd0e5b6
All checks were successful
build / test (pull_request) Successful in 11m57s
build / docs (pull_request) Has been skipped
to 7620399eec
All checks were successful
build / test (pull_request) Successful in 12m2s
build / docs (pull_request) Has been skipped
2025-08-15 20:25:29 +00:00
Compare
Author
Contributor

Slidge nudge @witten for the off-chance that you didn't get a notify for #1121 (comment) -- would love to bring this over the finish line.

Slidge nudge @witten for the off-chance that you didn't get a notify for https://projects.torsion.org/borgmatic-collective/borgmatic/pulls/1121#issuecomment-11732 -- would love to bring this over the finish line.
Owner

Thanks for the nudge.. I'm way behind on borgmatic stuff! As for the issue you've encountered, yeah, that's pretty obnoxious. I can think of a few classes of solutions, none of them great:

  1. Rename your new flag from --container to something else to avoid the conflict.
  2. Rename the configuration option from container: to something else. Normally however this would require a lengthy backwards-compatibility phase, which would kind of defeat the purpose of the rename in the short term. A breaking change rename though would solve that—at the risk of upsetting some users.
  3. Dive into the command-line parsing logic and try to make it smarter, such that flags on individual actions get consumed before they get passed to the general configuration override flag parsing. This however may be pretty difficult depending on how the existing logic works. (I know for instance that there is some very intentional sharing of flags between actions, e.g. to support the case where multiple actions need to consume the same flag. I mention this just because it means that naive/greedy consumption of flags may break some use cases.) I'll note that this may not even be a desirable solution, because it may mean that, for instance, the --container override of the container: option will no longer ever work when the restore action is used.
  4. So the --container flag is colliding with the container: option—which is of type: object in the configuration schema. Maybe overriding container: by itself (instead of, say, container.secrets_directory) isn't even a valid use case? Maybe if a flag like --container matches a piece of the schema of type: object, we can ignore it for purposes of a config override? Or are there legitimate use cases where we want users to be able to specify a map on the command-line like: --container={'secrets_directory': '/path'}? There is an existing use case of maps as list elements, as described here: https://torsion.org/borgmatic/docs/how-to/make-per-application-backups/#configuration-overrides

Let me know if you'd like me to look into any of these to help you out.

Thanks for the nudge.. I'm way behind on borgmatic stuff! As for the issue you've encountered, yeah, that's pretty obnoxious. I can think of a few classes of solutions, none of them great: 1. Rename your new flag from `--container` to something else to avoid the conflict. 2. Rename the configuration option from `container:` to something else. Normally however this would require a lengthy backwards-compatibility phase, which would kind of defeat the purpose of the rename in the short term. A breaking change rename though would solve that—at the risk of upsetting some users. 3. Dive into the command-line parsing logic and try to make it smarter, such that flags on individual actions get consumed before they get passed to the general configuration override flag parsing. This however may be pretty difficult depending on how the existing logic works. (I know for instance that there is some very intentional sharing of flags between actions, e.g. to support the case where multiple actions need to consume the same flag. I mention this just because it means that naive/greedy consumption of flags may break some use cases.) I'll note that this may not even be a desirable solution, because it may mean that, for instance, the `--container` override of the `container:` option will no longer ever work when the `restore` action is used. 4. So the `--container` flag is colliding with the `container:` option—which is of `type: object` in the configuration schema. Maybe overriding `container:` by itself (instead of, say, `container.secrets_directory`) isn't even a valid use case? Maybe if a flag like `--container` matches a piece of the schema of `type: object`, we can ignore it for purposes of a config override? Or are there legitimate use cases where we want users to be able to specify a map on the command-line like: `--container={'secrets_directory': '/path'}`? There *is* an existing use case of maps as list elements, as described here: https://torsion.org/borgmatic/docs/how-to/make-per-application-backups/#configuration-overrides Let me know if you'd like me to look into any of these to help you out.
Owner

For option number 4, here's one way you could implement it...

Change this line in config/arguments.py from this:

        if option_type is None:
            continue
        if option_type in ('object', None):
            continue

That would prevent --container (or similar flags) from matching any option of type object in the schema. And I don't think this breaks the array-of-object case from the documentation, because in that situation the object isn't at the top level and therefore wouldn't get caught by the if check above.

I can rationalize this approach to myself because the command-line override feature really isn't intended for setting entire complex objects from the command line (with the exception of list elements). Rather, the user is intended to --dot.into.the.option.they.want and override it that way.

For option number 4, here's one way you could implement it... Change [this line in `config/arguments.py`](https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/config/arguments.py#L151) from this: ```python if option_type is None: continue ``` ```python if option_type in ('object', None): continue ``` That would prevent `--container` (or similar flags) from matching any option of type `object` in the schema. And I don't think this breaks the `array`-of-`object` case from the documentation, because in that situation the `object` isn't at the top level and therefore wouldn't get caught by the `if` check above. I can rationalize this approach to myself because the command-line override feature really isn't intended for setting entire complex objects from the command line (with the exception of list elements). Rather, the user is intended to `--dot.into.the.option.they.want` and override it that way.
apollo13 force-pushed ticket1116 from 7620399eec
All checks were successful
build / test (pull_request) Successful in 12m2s
build / docs (pull_request) Has been skipped
to 099eab4c50
All checks were successful
build / test (pull_request) Successful in 11m53s
build / docs (pull_request) Has been skipped
2025-08-19 19:11:00 +00:00
Compare
apollo13 changed title from WIP: Container database backups to Container database backups 2025-08-19 19:11:50 +00:00
Author
Contributor

Thanks, option 4 seems to work nicely. I removed the WIP status of this PR, it is rather big as is and I hope we don't need many further changes. Let me know what you think.

EDIT:// if the tests fail that must be due to the latest rebase in which case I will fix them tomorrow. But all in all this PR should be in a reviewable state.

Thanks, option 4 seems to work nicely. I removed the WIP status of this PR, it is rather big as is and I hope we don't need many further changes. Let me know what you think. EDIT:// if the tests fail that must be due to the latest rebase in which case I will fix them tomorrow. But all in all this PR should be in a reviewable state.
witten left a comment
Owner

Partial review just to get you something sooner rather than later. I'll get to the rest as soon as I get a chance!

Partial review just to get you something sooner rather than later. I'll get to the rest as soon as I get a chance!
@ -63,2 +70,3 @@
if port:
if label is not UNSPECIFIED:
metadata = f'{name}@{label}'
Owner

I wonder if, for this case, it'd be useful to still show hostname and port but then maybe show the label in parens or something.

Do not feel strongly.

I wonder if, for this case, it'd be useful to still show hostname and port but then maybe show the label in parens or something. Do not feel strongly.
Author
Contributor

Is that even possible? I mean the dump metadata comes from the filesystem directory structure where you have postgres_databases/<host>:<port>/<db> or postgres_databases/<label>/<db>, so you wouldn't have access to host & port at all in this case, or am I missing something here.

Is that even possible? I mean the dump metadata comes from the filesystem directory structure where you have `postgres_databases/<host>:<port>/<db>` or `postgres_databases/<label>/<db>`, so you wouldn't have access to host & port at all in this case, or am I missing something here.
Owner

I think not all callers to render_dump_metadata() pull the Dump objects from directory names. Some Dumps come from other sources, like what the user requested to restore on the command-line.

In any case, see the other comment thread about potentially encoding both hostname/port and label in the directory name. That would allow render_dump_metadata() to receive both, even in the case that the Dump is coming from a directory name.

But let me know if, for instance, you think from a UX perspective it'd be better to always hide the hostname/port when label is set. I can see that being a legitimate argument when the label comes from a container name, but less so when it's just a hostname/port with a configured label.

I think not all callers to `render_dump_metadata()` pull the `Dump` objects from directory names. Some `Dump`s come from other sources, like what the user requested to restore on the command-line. In any case, see the other comment thread about potentially encoding both `hostname`/`port` and `label` in the directory name. That would allow `render_dump_metadata()` to receive both, even in the case that the `Dump` is coming from a directory name. But let me know if, for instance, you think from a UX perspective it'd be better to always hide the hostname/port when `label` is set. I can see that being a legitimate argument when the label comes from a container name, but less so when it's just a `hostname`/`port` with a configured `label`.
apollo13 marked this conversation as resolved
@ -315,3 +331,3 @@
port = None
dumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port))
dumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port, host_and_port))
Owner

What's the thinking with setting the label field to the host_and_port value? I'm wondering if that would cause some of the if dump.label: ... code paths above to get triggered when they wouldn't otherwise.

What's the thinking with setting the `label` field to the `host_and_port` value? I'm wondering if that would cause some of the `if dump.label: ...` code paths above to get triggered when they wouldn't otherwise.
Author
Contributor

Well host_and_port is either <host>:<port> or <label>, unless we somehow "tag" the directory name like @<label> there is no way for use to know whether <host>:<port> is ment to be host & port or a label named host:port

Well `host_and_port` is either `<host>:<port>` or `<label>`, unless we somehow "tag" the directory name like `@<label>` there is no way for use to know whether `<host>:<port>` is ment to be host & port or a label named `host:port`
Owner

Gotcha. Any reason not to stick the label in the directory name as you describe? Seems maybe a little cleaner / more explicit.

The main downside I can see is that if you upgrade borgmatic, run create with a labeled database, downgrade borgmatic, and then run restore, the label will get parsed from the directory name as part of the port. But that seems like a pretty minor edge case.

Gotcha. Any reason not to stick the label in the directory name as you describe? Seems maybe a little cleaner / more explicit. The main downside I can see is that if you upgrade borgmatic, run `create` with a `label`ed database, downgrade borgmatic, and then run `restore`, the `label` will get parsed from the directory name as part of the port. But that seems like a pretty minor edge case.
Author
Contributor

Mhm, the only reason was me being lazy. I mean it would not just be the label but also the container name that we need to stick into the directory name if we want that to be reversible. Coming up with sensible naming though might get hard? Is there a limit which characters borg allows in a filename? if not something like:

  • <host>:<port>@<label>
  • c::<container>:<port>@<label>

would work where @<label> is only there if the label is set at all (backwards compat) and c:: is needed to distinguish between container & host -- open to better ideas though 🤷‍♂️

And sticking the container name back into the Dump object will be hilarious for sure since it will require touching all tests again, but I guess that is what I get for suggesting the feature in the first place 😆

Mhm, the only reason was me being lazy. I mean it would not just be the label but also the container name that we need to stick into the directory name if we want that to be reversible. Coming up with sensible naming though might get hard? Is there a limit which characters borg allows in a filename? if not something like: * `<host>:<port>@<label>` * `c::<container>:<port>@<label>` would work where `@<label>` is only there if the label is set at all (backwards compat) and `c::` is needed to distinguish between container & host -- open to better ideas though 🤷‍♂️ And sticking the container name back into the `Dump` object will be hilarious for sure since it will require touching all tests again, but I guess that is what I get for suggesting the feature in the first place 😆
Owner

I mean it would not just be the label but also the container name that we need to stick into the directory name if we want that to be reversible.

Good point.

Coming up with sensible naming though might get hard?

Indeed. Okay, just brainstorming here, so feel free to shoot any of this down...

I know I was the one who suggested it, but it's starting to feel like we're encoding a whole lot of info into the dump path. The purpose of the original host:port path component was to effectively namespace dumps of same-named databases, e.g. so you could have separate foo database dumps from multiple different hosts and have those dumps not collide on the filesystem (and in the archive).

With the introduction of label and container name though, the "identity" expressed by a simple hostname and port pair is no longer sufficient. So—hear me out—what if we instead serialized the entire Dump object to a JSON file? Easy to write and easy to read. That could live alongside a similarly named database dump file. So for instance if you've got a foo database dump filename, there could be a foo.json or foo.metadata or whatever right alongside it that contains something like:

{
   'hook_name': 'postgresql',
   'data_source_name': 'foo',
   'container_name': None,
   'hostname': 'myhost',
   'port': '1234',
   'label': 'mylabel'
}

So what about the parent directory name? Well we no longer need to encode any data into the path, so it could just be a hash of that Dump data structure or something similar. (With the caveat, to your earlier point in a source code comment, that label itself is effectively a unique identifier if set.) There is however the use case of a user manually restoring a database dump from an archive, so maybe it would be best if the path component was still host:port but with a unique hash suffix or something tacked onto it. That way a user can still find a dump they're looking for. Alternatively, you could literally call render_dump_metadata() (or a similar function) and stick the result of that into the directory name, since this info is now for the user instead of the computer.

Again, just brainstorming here. If this feels like too much scope creep, let me know.

> I mean it would not just be the label but also the container name that we need to stick into the directory name if we want that to be reversible. Good point. > Coming up with sensible naming though might get hard? Indeed. Okay, just brainstorming here, so feel free to shoot any of this down... I know I was the one who suggested it, but it's starting to feel like we're encoding a whole lot of info into the dump path. The purpose of the original `host:port` path component was to effectively namespace dumps of same-named databases, e.g. so you could have separate `foo` database dumps from multiple different hosts and have those dumps not collide on the filesystem (and in the archive). With the introduction of label and container name though, the "identity" expressed by a simple hostname and port pair is no longer sufficient. So—hear me out—what if we instead serialized the entire `Dump` object to a JSON file? Easy to write and easy to read. That could live alongside a similarly named database dump file. So for instance if you've got a `foo` database dump filename, there could be a `foo.json` or `foo.metadata` or whatever right alongside it that contains something like: ``` { 'hook_name': 'postgresql', 'data_source_name': 'foo', 'container_name': None, 'hostname': 'myhost', 'port': '1234', 'label': 'mylabel' } ``` So what about the parent directory name? Well we no longer need to encode any data into the path, so it could just be a hash of that `Dump` data structure or something similar. (With the caveat, to your earlier point in a source code comment, that `label` itself is effectively a unique identifier if set.) There is however the use case of a user manually restoring a database dump from an archive, so maybe it would be best if the path component was still `host:port` but with a unique hash suffix or something tacked onto it. That way a user can still find a dump they're looking for. Alternatively, you could literally call `render_dump_metadata()` (or a similar function) and stick the result of that into the directory name, since this info is now for the user instead of the computer. Again, just brainstorming here. If this feels like too much scope creep, let me know.
Author
Contributor

With the introduction of label and container name though, the "identity" expressed by a simple hostname and port pair is no longer sufficient. So—hear me out—what if we instead serialized the entire Dump object to a JSON file?

Yes that would certainly make sense and was already at the back of my head. I am just a little bit worried that the PR stays reviewable, it is already rather large. But if you are fine with it I guess we could add something like that.

Alternatively, you could literally call render_dump_metadata() (or a similar function) and stick the result of that into the directory name, since this info is now for the user instead of the computer.

While that would probably work I am not sure how I feel about directory names containing special characters, whitespaces and what not :D

With the caveat, to your earlier point in a source code comment, that label itself is effectively a unique identifier if set

only in theory though since nothing enforces this currently. Would have to check how borgmatic handles that currently

> With the introduction of label and container name though, the "identity" expressed by a simple hostname and port pair is no longer sufficient. So—hear me out—what if we instead serialized the entire `Dump` object to a JSON file? Yes that would certainly make sense and was already at the back of my head. I am just a little bit worried that the PR stays reviewable, it is already rather large. But if you are fine with it I guess we could add something like that. > Alternatively, you could literally call `render_dump_metadata()` (or a similar function) and stick the result of that into the directory name, since this info is now for the user instead of the computer. While that would probably work I am not sure how I feel about directory names containing special characters, whitespaces and what not :D > With the caveat, to your earlier point in a source code comment, that `label` itself is effectively a unique identifier if set only in theory though since nothing enforces this currently. Would have to check how borgmatic handles that currently
Owner

Yes that would certainly make sense and was already at the back of my head. I am just a little bit worried that the PR stays reviewable, it is already rather large. But if you are fine with it I guess we could add something like that.

Honestly I'd do it as a totally separate PR for that very reason. It could even be based on main since I don't think it inherently needs anything label or container specific. It's "just" writing out dump metadata to a new location and reading it from a new location. (And probing / falling back to the path-based metadata for backwards compatibility.)

And if you don't want to do it, I'd be happy to take a stab at it. You're also of course welcome to do it yourself as a separate PR or as part of this PR if you'd prefer. But I think it'd be easier to implement and review separately.

While that would probably work I am not sure how I feel about directory names containing special characters, whitespaces and what not :D

Yeah, that's why I said "or a similar function." I agree that spaces and parens probably shouldn't be in the path.

> Yes that would certainly make sense and was already at the back of my head. I am just a little bit worried that the PR stays reviewable, it is already rather large. But if you are fine with it I guess we could add something like that. Honestly I'd do it as a totally separate PR for that very reason. It could even be based on main since I don't *think* it inherently needs anything `label` or `container` specific. It's "just" writing out dump metadata to a new location and reading it from a new location. (And probing / falling back to the path-based metadata for backwards compatibility.) And if you don't want to do it, I'd be happy to take a stab at it. You're also of course welcome to do it yourself as a separate PR or as part of this PR if you'd prefer. But I think it'd be easier to implement and review separately. > While that would probably work I am not sure how I feel about directory names containing special characters, whitespaces and what not :D Yeah, that's why I said "or a similar function." I agree that spaces and parens probably shouldn't be in the path.
Author
Contributor

And if you don't want to do it, I'd be happy to take a stab at it. You're also of course welcome to do it yourself as a separate PR or as part of this PR if you'd prefer. But I think it'd be easier to implement and review separately.

I won't get to it this (and probably not next) week, so if you find some spare time please give it a try. 👍

> And if you don't want to do it, I'd be happy to take a stab at it. You're also of course welcome to do it yourself as a separate PR or as part of this PR if you'd prefer. But I think it'd be easier to implement and review separately. I won't get to it this (and probably not next) week, so if you find some spare time please give it a try. 👍
Owner
https://projects.torsion.org/borgmatic-collective/borgmatic/pulls/1136
Owner

Okay, have at it!

Okay, have at it!
apollo13 marked this conversation as resolved
@ -1488,2 +1492,4 @@
help='Path to restore SQLite database dumps to. Defaults to the "restore_path" option in borgmatic\'s configuration',
)
restore_group.add_argument(
'--original-label',
Owner

Worth also having an --original-container?

Worth also having an `--original-container`?
Author
Contributor

Can do, but since container in the config is just set as label in the resulting directory structure (since we don't safe metadata but solely have that information in the path) it is always the label that is used… You can even do --original-label=host:port instead of --original-host=host --original-port=port.

Can do, but since `container` in the config is just set as `label` in the resulting directory structure (since we don't safe metadata but solely have that information in the path) it is always the label that is used… You can even do `--original-label=host:port` instead of `--original-host=host --original-port=port`.
Owner

The two flags (--original-label and --original-container) could even be aliases for one another.

Makes me wonder though if at some point (not in this PR), we should store database metadata a little more formally than encoding it in the path.

The two flags (`--original-label` and `--original-container`) could even be aliases for one another. Makes me wonder though if at some point (not in this PR), we should store database metadata a little more formally than encoding it in the path.
apollo13 marked this conversation as resolved
@ -1489,1 +1493,4 @@
)
restore_group.add_argument(
'--original-label',
help='The label where to dump to restore came from, only necessary if you need to disambiguate dumps',
Owner

Nit: "to" -> "the".

Nit: "to" -> "the".
apollo13 marked this conversation as resolved
@ -151,1 +149,3 @@
if option_type is None:
# The argument doesn't correspond to any option in the schema, or it is a complex argument, so ignore it.
# It's probably a flag that borgmatic has on the command-line but not in configuration.
if option_type in {'object', None}:
Owner

It'd be great to have a test update for this change. The tests are in tests/unit/config/test_arguments.py and ones to look at for examples might be test_prepare_arguments_for_config_skips_option_with_none_value() or test_prepare_arguments_for_config_skips_option_missing_from_schema().

It'd be great to have a test update for this change. The tests are in `tests/unit/config/test_arguments.py` and ones to look at for examples might be `test_prepare_arguments_for_config_skips_option_with_none_value()` or `test_prepare_arguments_for_config_skips_option_missing_from_schema()`.
apollo13 marked this conversation as resolved
@ -0,0 +10,4 @@
logger = logging.getLogger(__name__)
def resolve_database_option(option, data_source, connection_params=None, restore=False):
Owner

It'd be good to have docstrings for these functions, ideally describing what they do, what arguments they expect, what they return, what they error (if any), etc.

It'd be good to have docstrings for these functions, ideally describing what they do, what arguments they expect, what they return, what they error (if any), etc.
apollo13 marked this conversation as resolved
@ -0,0 +21,4 @@
return data_source.get(option)
def _get_hostname_from_config(data_source, connection_params=None, restore=False):
Owner

Reminder: This codebase generally doesn't use underscore-prefixed function names.

Do not feel strongly.

Reminder: This codebase generally doesn't use underscore-prefixed function names. Do not feel strongly.
apollo13 marked this conversation as resolved
@ -225,0 +224,4 @@
hostname = utils.resolve_database_option('hostname', database, restore=use_restore_options)
port = utils.resolve_database_option('port', database, restore=use_restore_options)
username = utils.resolve_database_option('username', database, restore=use_restore_options)
password = utils.resolve_database_option('password', database, restore=use_restore_options)
Owner

End-to-end tests generally (always?) avoid using any borgmatic code except by invoking borgmatic on the command-line. It sort of stems from the practice of not using code under test as part of the test logic itself. And in the case of an end-to-end test, pretty much all of borgmatic is under test.

Anyway, I see why you're doing this here, at least for hostname. For that one, could you just use the container name, since the test container and the database container happen to be running on the same Docker network?

End-to-end tests generally (always?) avoid using any borgmatic code except by invoking borgmatic on the command-line. It sort of stems from the practice of not using code under test as part of the test logic itself. And in the case of an end-to-end test, pretty much all of borgmatic is under test. Anyway, I see why you're doing this here, at least for `hostname`. For that one, could you just use the container name, since the test container and the database container happen to be running on the same Docker network?
Author
Contributor

Makes sense, will fix this once I address the other points.

Makes sense, will fix this once I address the other points.
apollo13 marked this conversation as resolved
@ -659,0 +669,4 @@
original_working_directory = os.getcwd()
original_path = os.environ.get('PATH', '')
os.environ['PATH'] = f'/app/tests/end-to-end/commands:{original_path}'
Owner

Nice.

Nice.
apollo13 marked this conversation as resolved
witten left a comment
Owner

Another partial review!

Another partial review!
@ -0,0 +13,4 @@
def resolve_database_option(option, data_source, connection_params=None, restore=False):
# Special case `hostname` since it overlaps with `container`
if option == 'hostname':
return _get_hostname_from_config(data_source, connection_params, restore)
Owner

Given that all callers to resolve_database_option() are passing in option as a string literal (as opposed to a variable), what do you think of just having a separate get_hostname_from_config() or resolve_hostname_option() that's called directly rather than going through resolve_database_option() for the special hostname case? It just might let you lose one level of indirection.

Do not feel strongly.

Given that all callers to `resolve_database_option()` are passing in `option` as a string literal (as opposed to a variable), what do you think of just having a separate `get_hostname_from_config()` or `resolve_hostname_option()` that's called directly rather than going through `resolve_database_option()` for the special hostname case? It just might let you lose one level of indirection. Do not feel strongly.
Author
Contributor

I just tried that, but I rather not for the simple reason that the you'd now have to know from the outside that hostname needs specialcasing, if you mistakenly still call this function with hostname you'd get the wrong result. So I'd still need an if to check for hostname and then error out… So I rather give the user a consistent interface.

I just tried that, but I rather not for the simple reason that the you'd now have to know from the outside that `hostname` needs specialcasing, if you mistakenly still call this function with `hostname` you'd get the wrong result. So I'd still need an `if` to check for `hostname` and then error out… So I rather give the user a consistent interface.
Owner

Fair enough!

Fair enough!
witten marked this conversation as resolved
@ -0,0 +37,4 @@
# ... and finally fall back to the normal options
if 'container' in data_source:
return get_ip_from_container(data_source['container'])
return data_source.get('hostname', '')
Owner

Might be nice to use None instead of '' for the case of no known hostname? And then if a caller wants to interpret that missing hostname as '', say for comparison purposes, it can of course do that.

Might be nice to use `None` instead of `''` for the case of no known hostname? And then if a caller wants to interpret that missing hostname as `''`, say for comparison purposes, it can of course do that.
apollo13 marked this conversation as resolved
@ -0,0 +71,4 @@
main_ip = network_data.get('IPAddress')
if main_ip:
return main_ip
# No main IP found, try the networks
Owner

Ah clever.

Ah clever.
apollo13 marked this conversation as resolved
@ -0,0 +74,4 @@
# No main IP found, try the networks
for network in network_data.get('Networks', {}).values():
ip = network.get('IPAddress')
if ip:
Owner

Could use your fancy walrus operator here. 😄 Above too.

Could use your fancy walrus operator here. 😄 Above too.
apollo13 marked this conversation as resolved
@ -122,6 +123,7 @@ def database_names_to_dump(database, config, username, password, environment, dr
)
extra_options, defaults_extra_filename = parse_extra_options(database.get('list_options'))
password_transport = database.get('password_transport', 'pipe')
hostname = ds_config.resolve_database_option('hostname', database)
Owner

Instead of ds_config, you could call this database_config since it's only used for databases and not other types of data sources (filesystems, etc.).

Instead of `ds_config`, you could call this `database_config` since it's only used for databases and not other types of data sources (filesystems, etc.).
apollo13 marked this conversation as resolved
@ -365,3 +369,3 @@
return (
dump.make_data_source_dump_filename(make_dump_path('borgmatic'), name, hostname='*'),
dump.make_data_source_dump_filename(make_dump_path('borgmatic'), name, label='*'),
Owner

I think this change has the side effect of dropping the port from the produced dump pattern. Is that intentional? It might mean that the pattern matches more paths than it did when the port was included.

I think this change has the side effect of dropping the port from the produced dump pattern. Is that intentional? It might mean that the pattern matches more paths than it did when the port was included.
Author
Contributor

In theory you are right, in practice if you look at how make_data_source_dump_filename is implemented it wouldn't add a port anyways since for this specific call it is None. So it doesn't really matter if the label or the hostname is set to *, the resulting pattern will be * in both cases.

In theory you are right, in practice if you look at how `make_data_source_dump_filename` is implemented it wouldn't add a port anyways since for this specific call it is `None`. So it doesn't really matter if the `label` or the `hostname` is set to `*`, the resulting pattern will be `*` in both cases.
Owner

Got it. Thanks for explaining.

Got it. Thanks for explaining.
witten marked this conversation as resolved
@ -407,3 +407,1 @@
connection_params['username']
or data_source.get('restore_username', data_source.get('username'))
),
ds_config.resolve_database_option('username', data_source, connection_params, restore=True),
Owner

FWIW, I like the way you've pulled out a bunch of this logic into a centralized utility function. So much DRYer!

FWIW, I like the way you've pulled out a bunch of this logic into a centralized utility function. So much DRYer!
apollo13 marked this conversation as resolved
@ -49,2 +36,2 @@
except (AttributeError, KeyError):
pass
password = ds_config.resolve_database_option(
'password', database, restore_connection_params, restore=restore_connection_params
Owner

You might consider slapping a bool() around the restore_connection_params passed to restore=.

You might consider slapping a `bool()` around the `restore_connection_params` passed to `restore=`.
Author
Contributor

Mhm feels a bit un-pythonic to me, so unless you feel strongly I would leave it as is.

Mhm feels a bit un-pythonic to me, so unless you feel strongly I would leave it as is.
Owner

I don't feel strongly.

I don't feel strongly.
witten marked this conversation as resolved
@ -347,2 +338,3 @@
data_source['name'],
data_source.get('hostname'),
hostname=data_source.get('hostname'),
port=data_source.get('port'),
Owner

Should these be pulling from the hostname and port variables from above instead of looking in data_source?

Should these be pulling from the `hostname` and `port` variables from above instead of looking in `data_source`?
apollo13 marked this conversation as resolved
address some review comments
All checks were successful
build / test (pull_request) Successful in 11m26s
build / docs (pull_request) Has been skipped
6f17128cf9
address more review comments
Some checks failed
build / test (pull_request) Failing after 2m12s
build / docs (pull_request) Has been skipped
236c2c102d
apollo13 force-pushed ticket1116 from 236c2c102d
Some checks failed
build / test (pull_request) Failing after 2m12s
build / docs (pull_request) Has been skipped
to a58dce9ee7
All checks were successful
build / test (pull_request) Successful in 7m12s
build / docs (pull_request) Has been skipped
2025-08-22 12:59:52 +00:00
Compare
address more review comments
All checks were successful
build / test (pull_request) Successful in 7m12s
build / docs (pull_request) Has been skipped
a58dce9ee7
Finalize container support
All checks were successful
build / test (pull_request) Successful in 11m56s
build / docs (pull_request) Has been skipped
87d3fce318
Finalize container support
All checks were successful
build / test (pull_request) Successful in 11m56s
build / docs (pull_request) Has been skipped
87d3fce318
Finalize container support
All checks were successful
build / test (pull_request) Successful in 7m12s
build / docs (pull_request) Has been skipped
0e90087dc4
Author
Contributor

@witten I think I am done with this PR :) My local testing worked fine. While the end to end tests also work I would appreciate if you could take it through a manual testrun as well -- not that I am missing something.

@witten I think I am done with this PR :) My local testing worked fine. While the end to end tests also work I would appreciate if you could take it through a manual testrun as well -- not that I am missing something.
witten left a comment
Owner

Partial review! (About half of the PR.) I think this is just about ready to merge from what I've seen.

I've also done some sporadic manual tests: PostgreSQL, SQLite, dumps, restores, various permutations of hostname, container, label, etc. Everything looks good so far.

Partial review! (About half of the PR.) I think this is just about ready to merge from what I've seen. I've also done some sporadic manual tests: PostgreSQL, SQLite, dumps, restores, various permutations of `hostname`, `container`, `label`, etc. Everything looks good so far.
@ -176,0 +188,4 @@
data_source['name'],
data_source.get('hostname'),
data_source.get('port'),
data_source.get('label') or data_source.get('container') or UNSPECIFIED,
Owner

Shouldn't this parcel out label and container into separate fields now?

Shouldn't this parcel out `label` and `container` into separate fields now?
Owner

I might be missing something, but why not pass container as well here?

I might be missing something, but why not pass `container` as well here?
Author
Contributor

Because I f'ed it up :/

Because I f'ed it up :/
Owner

Oh, no worries!

Oh, no worries!
witten marked this conversation as resolved
@ -372,3 +389,3 @@
port = None
dumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port))
dumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port, host_and_port))
Owner

I may have already asked this, but: Why add host_and_port as the label? I'd imagine that you could just leave label blank.

I may have already asked this, but: Why add `host_and_port` as the `label`? I'd imagine that you could just leave `label` blank.
Author
Contributor

Yes, this is still from the old container code (same for the above)

Yes, this is still from the old container code (same for the above)
apollo13 marked this conversation as resolved
@ -326,6 +333,8 @@ def dump_data_sources(
database['name'],
database.get('hostname', 'localhost'),
Owner

I've noticed that this ends up setting the hostname to localhost even when hostname isn't configured and only container is set. Seems wrong/misleading... Do you think it's worth worrying about?

Do not feel strongly.

I've noticed that this ends up setting the `hostname` to `localhost` even when `hostname` isn't configured and only `container` is set. Seems wrong/misleading... Do you think it's worth worrying about? Do not feel strongly.
Author
Contributor

I noticed that as well, will try to remove it and see if any tests break horribly.

I noticed that as well, will try to remove it and see if any tests break horribly.
Author
Contributor

I have reverted the localhost changes as I am running from test error to test error :/ I would recommend fixing them in a follow up PR if wanted. It would especially be useful if 'localhost' would get removed from the Dump defaults, because that also has far reaching consequences.

I have reverted the localhost changes as I am running from test error to test error :/ I would recommend fixing them in a follow up PR if wanted. It would especially be useful if 'localhost' would get removed from the `Dump` defaults, because that also has far reaching consequences.
Owner

Okay, no problem. Let me know if you'd like me to tackle that or if you'd rather do it (after this PR).

Okay, no problem. Let me know if you'd like me to tackle that or if you'd rather do it (after this PR).
witten marked this conversation as resolved
@ -6,6 +6,7 @@ import borgmatic.borg.pattern
import borgmatic.config.paths
import borgmatic.hooks.credential.parse
from borgmatic.execute import execute_command, execute_command_with_processes
from borgmatic.hooks.data_source import config as ds_config
Owner

Could be database_config as well.

Could be `database_config` as well.
apollo13 marked this conversation as resolved
@ -12,6 +12,7 @@ from borgmatic.execute import (
execute_command_and_capture_output,
execute_command_with_processes,
)
from borgmatic.hooks.data_source import config as ds_config
Owner

Same here.

Same here.
apollo13 marked this conversation as resolved
address some review comments
All checks were successful
build / test (pull_request) Successful in 13m19s
build / docs (pull_request) Has been skipped
4de879d86e
address more review comments
Some checks failed
build / test (pull_request) Failing after 8m19s
build / docs (pull_request) Has been skipped
616ee4edf8
final round of fixes.
Some checks failed
build / test (pull_request) Failing after 2m18s
build / docs (pull_request) Has been skipped
4b780d232e
final round of fixes.
Some checks failed
build / test (pull_request) Failing after 2m18s
build / docs (pull_request) Has been skipped
4b780d232e
final round of fixes.
All checks were successful
build / test (pull_request) Successful in 7m43s
build / docs (pull_request) Has been skipped
abad73d605
witten approved these changes 2025-09-17 22:59:44 +00:00
witten left a comment
Owner

FIRST OF ALL: Thank you so much for all of the time and effort you put into this PR. I know it's probably/definitely taken longer than you expected (which was in no small part entirely my fault), and so I really appreciate you sticking with this.

I do have some minor bits of feedback here, because of course I do, BUT in the interests of: 1. moving things along, and 2. giving you a break... I'll take care of these minor items myself after I merge this. The one exception is the localhost thing (see comments), which I will also take care of after I merge this—unless you'd prefer to take care of it yourself after this is merged. Just let me know either way.

Onward!

FIRST OF ALL: Thank you so much for all of the time and effort you put into this PR. I know it's probably/definitely taken longer than you expected (which was in no small part entirely my fault), and so I really appreciate you sticking with this. I do have some minor bits of feedback here, because of course I do, BUT in the interests of: 1. moving things along, and 2. giving you a break... I'll take care of these minor items myself _after_ I merge this. The one exception is the localhost thing (see comments), which I will also take care of after I merge this—unless you'd prefer to take care of it yourself after this is merged. Just let me know either way. Onward!
@ -241,3 +228,2 @@
Alter the ports in these examples to suit your particular database system.
Now borgmatic will use the `docker`/`podman` CLI to figure out the container IP.
Owner

It's heavily implied, but is it worth saying that container: will only work on the host? Like, I can see someone assuming that they can use container: even when borgmatic is running in a container because, after all, it is connecting to a database in a container. (Which incidentally might be a nice future addition.)

It's heavily implied, but is it worth saying that `container:` will only work on the host? Like, I can see someone assuming that they can use `container:` even when borgmatic is running in a container because, after all, it *is* connecting to a database in a container. (Which incidentally might be a nice future addition.)
Author
Contributor

I am planning to use borgmatic inside a container to dump other containers. As long as the docker socket is mapped in and the docker CLI installed, this should just work like on the host itself.

I am planning to use borgmatic inside a container to dump other containers. As long as the docker socket is mapped in and the docker CLI installed, this should just work like on the host itself.
witten marked this conversation as resolved
@ -0,0 +6,4 @@
from borgmatic.hooks.data_source import config
def test_get_database_option():
Owner

Nit: Should be named test_resolve_database_option() given the name of the unit under test.

Nit: Should be named `test_resolve_database_option()` given the name of the unit under test.
witten marked this conversation as resolved
@ -0,0 +19,4 @@
assert (
config.resolve_database_option('option', data_source, connection_params, restore=True)
== 'connection_value'
)
Owner

I'd suggest making these separate tests (or test cases) so as not to mask test failures when the first one fails. Separate tests also allows very clearly spelling out expectations in the name of each test.

I'd suggest making these separate tests (or test cases) so as not to mask test failures when the first one fails. Separate tests also allows very clearly spelling out expectations in the name of each test.
witten marked this conversation as resolved
@ -0,0 +22,4 @@
)
def test_get_hostname_option_via_container():
Owner

Similar feedback on this test and some others below.

Similar feedback on this test and some others below.
witten marked this conversation as resolved
witten merged commit a276aaa31f into main 2025-09-17 23:02:58 +00:00
Author
Contributor

FIRST OF ALL: Thank you so much for all of the time and effort you put into this PR. I know it's probably/definitely taken longer than you expected (which was in no small part entirely my fault), and so I really appreciate you sticking with this.

No worries, we are all doing this in our free time and I highly appreciate your reviews -- after all (for it's size) we got this in relatively fast imo.

The one exception is the localhost thing (see comments), which I will also take care of after I merge this—unless you'd prefer to take care of it yourself after this is merged.

I really wouldn't mind if you want to take care of that one, I cannot see mocks anymore :D

> FIRST OF ALL: Thank you so much for all of the time and effort you put into this PR. I know it's probably/definitely taken longer than you expected (which was in no small part entirely my fault), and so I really appreciate you sticking with this. No worries, we are all doing this in our free time and I highly appreciate your reviews -- after all (for it's size) we got this in relatively fast imo. > The one exception is the localhost thing (see comments), which I will also take care of after I merge this—unless you'd prefer to take care of it yourself after this is merged. I really wouldn't mind if you want to take care of that one, I cannot see mocks anymore :D
Owner

I really wouldn't mind if you want to take care of that one, I cannot see mocks anymore :D

Haha I hear you. I'll see what I can do. (No promises though, because I may get to a similar point...)

> I really wouldn't mind if you want to take care of that one, I cannot see mocks anymore :D Haha I hear you. I'll see what I can do. (No promises though, because I may get to a similar point...)
Owner

I think I kinda maybe did it? a388b17271

I think I kinda maybe did it? https://projects.torsion.org/borgmatic-collective/borgmatic/commit/a388b17271ad9acd9c93aca0d1bbd65302c11f01
Author
Contributor

That looks good indeed.

That looks good indeed.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
borgmatic-collective/borgmatic!1121
No description provided.