Container database backups #1121
No reviewers
Labels
No labels
blocked
breaking
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
borgmatic-collective/borgmatic!1121
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "apollo13/borgmatic:ticket1116"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Initial PR without tests to gather design feedback
container_nameinstead ofhostname#1116Overall this looks like a solid approach to me. Thanks for taking it on!
Okay, got it.
This makes total sense to me. In many ways, borgmatic is a glorified shell script written in Python, just shelling out to external executables as needed. That's often more flexible than tight API integration.
Some things I think are probably missing from this PR though:
borgmatic restoreaction. E.g.--label/--container/--original-label/--original-container.restore_*configuration schema, e.g.restore_container.@ -13,3 +13,3 @@execute_command_with_processes,)from borgmatic.hooks.data_source import dumpfrom borgmatic.hooks.data_source import dump, utilsSort of a nit at the design review level, but: Just FYI I'm personally allergic to any file/module called
utils, because I've seen how much such a file can fill up with unrelated cruft over time. I suggest a name that's more descriptive of its purpose (what the contained functions have in common).Open to ideas, I am bad at naming :)
So am I. How about just
container.py? It'll maybe be a conflict if and when someone goes to implement #685 or #671, but we can deal with that then.Alternative if you don't like that:
hostname.pyoraddress.py.After the most recent changes utils basically does two things:
Personally I think they both belong into different categories and as such probably different files. Which seems overkill for those two functions so I'd suggest one of the following names instead:
config.py/config_helpers.py/config_utils.py. The idea behind the naming being that the functions (as currently used) all deal which the configuration aspects of a data source.Got it. How about just
config.pyfor now?@ -0,0 +10,4 @@def get_hostname_from_config(database):if 'container' in database:return get_ip_from_container(database['container'])return database.get('hostname', '')Worth putting the "localhost" default here?
I do not think so, because then it will always count as explicitly specified to be
localhostand the other checks checking whether the hostname is set or not will fail.Makes sense (as long as the "localhost" default is applied elsewhere).
@ -0,0 +14,4 @@def get_ip_from_container(container):engines = (shutil.which(engine) for engine in ('docker', 'podman'))TIL I learned about
shutil.which()!I'm not sure this 100% works though, because it might be legitimate to have both
dockerandpodmanexecutables installed. Maybe that's okay though since you're effectively probing for which one can successfully executecontainer inspectbelow?At some point it might be necessary for there to be borgmatic options to configure the container engine, but maybe we can get away without that right now.
Yes that was my thinking. I expect it to be rather uncommon to have both installed so I went with trying both and continuing if no container is found in one…
Regarding options for the container engine: Yeah maybe, but I didn't know where to put it, in theory you could have podman and docker running in parallel and backup containers from either, but adding a
container_engineto the database config seems rather silly.Fair enough. BTW I think I have an uncommon setup. 😄
Yeah, I don't think we need it now. But if/when we need it, it would probably make most sense at the global scope rather than as part of the database config.
@ -0,0 +18,4 @@engines = [engine for engine in engines if engine]if not engines:raise xxx # TODO: What to raise here, tell the user to install docker/podmanborgmatic abuses
CalledProcessError,ValueError, andOSErrorfor practically everything. (Rationale: I think custom exception types for every conceivable situation are kind of silly. Although one could argue borgmatic goes too far to the other extreme.)Any one of those exception types, once raised, will surface a nice message to the user.
@ -0,0 +32,4 @@))except subprocess.CalledProcessError:continue # Container does not existWorth logging debug?
@ -0,0 +34,4 @@except subprocess.CalledProcessError:continue # Container does not existnetwork_data = json.loads(output.strip())Should probably catch JSON decode errors here.
In theory yes, but in practice this command should really output json if it doesn't fail (the --format explicitly asks for it). If it doesn't borgmatic should blow up anyways and we have to figure out why…
I guess I'm just the type of programmer that looks both ways when crossing a one-way street. 😄
But one contrived situation I can think of that could cause this to trigger would be for someone to make a
dockerorpodmanwrapper script that does something funky with the JSON returned by the actualdockerorpodman. Stranger things have happened... For instance, users regularly write borgmatic-specific wrapper scripts used when shelling out tobtrfsor other executables because they want to customize what gets fed to borgmatic.Anyway...
It occurs to me.. I think I just figured out the likely first enhancement request for this feature. In my experience with other borgmatic features, as soon as you make borgmatic shell out to a named executable, tickets start rolling in to make that executable path customizable. (Hence
ssh_command,pg_dump_command,zfs_command, etc. etc.)I'm not suggesting you have to do that as part of this PR, but just be aware it may be an incoming feature request soon.
Okay, assuming I am going to catch a JSON decode error here. What would I do with it? I wouldn't want to skip it since it really is an error that should stop processing. And just rewrapping it into a ValueError and reraising for such an edge case (someone broke the API-contract provided by docker/podman) seems not worth it to me.
Happy to do it if you think it is useful (I am usually all for defensive coding) but I honestly don't see the benefit -- I rather prefer borgmatic crash with the original error (makes debugging easier imo).
I've long thought that borgmatic needs a
--tracebackflag that suppresses any error handling and just lets everything fail ungracefully for debugging purposes. Not suggesting you do that, but it would make this decision easier IMO if such a flag already existed. But yeah, even lacking that, I would re-raise as ValueError with a nice user-facing error message. Given a bug report containing that error message, I can always find exactly where it occurred (if not the entire call stack that a traceback would provide).But if it doesn't seem worth it, I'm fine waiting until it is. 😄
@ -0,0 +44,4 @@if ip:return ipraise xxx # No container ip found, what to raise hereSee above.
@ -31,2 +29,2 @@(hostname or 'localhost') + ('' if port is None else f':{port}'),name,identifier = (label if label else (hostname or 'localhost') + ('' if port is None else f':{port}')One side effect I can think of is that
borgmatic restore --original-hostname ...will no longer work for a database iflabelis set. Probably okay, but maybe should be documented somewhere so users aren't caught unawares.To be honest all those
original-*options feel a bit like overkill to me, personally I'd go with a single--original-id/--original-label/--original-folder(naming is hard) that let's you identify it. I mean if I am going to add--original-labelas differentiator you can always set it tohost:portto get back the same behavior as--original-host=host --original-port=port(if I am reading the code correctly)I'm fine with a flag like that. It'd probably most clear to call it
--original-labelthough sincelabelis a user-facing concept and ID/folder isn't (yet). Even if that makes it slightly inaccurate.Ahaha, and I thought I would get away with less work by not doing the interpolation stuff. But it shows that the current design is better (?) since with interpolation you couldn't easily restore into a container (well you can always explicitly specify the hostname, this would also allow restoral as of now).
Yeah, I guess that's a good argument for container being a first-class concept rather than interpolated bolt-on!
I think I got the plumbing done, what misses most currently is manual testing and one or two integration/end-to-end tests. Do you have any recommendations for them and especially how to run them safely so they don't nuke my system? :D
EDIT:// I see they run inside a docker-compose, so they won't nuke my system. But now I would need docker in docker which might get ugly quickly…
For what borgmatic calls "integration" tests: These are just tests that integrate multiple units (functions) and test the result. Usually two or three or four units at a time. As far as I'm concerned, these tests are often optional unless: 1. True unit tests for particular units would require a ridiculous amount of mocking, or 2. True unit tests wouldn't really test anything useful, and the logic under test is really at the multiple-unit level, and/or 3. There's a concern about individual unit tests not catching integration errors between functions, like the wrong arguments passed from one function to another.
In terms of the how for these tests, they should hopefully be even more straightforward than unit tests, as there should be little to no mocking needed. Just call a unit higher up in the call stack and make sure it does what it's supposed to.
For what borgmatic calls "end-to-end" tests: Yeah, these are the heavyweight tests that run inside Docker Compose and do things like spinning up actual database containers to test against, etc. These tests should be few and far between, just because they're so heavyweight.
I don't think you should need Docker-in-Docker though unless I'm missing something...? Everything is contained in the Compose containers that get spun up, so no system nuking should occur. Are you worried that Compose itself might misbehave and therefore you want to run Compose within its own container?
In terms of actual end-to-end tests for this feature, I could see appending to the existing configured borgmatic config in
tests/end-to-end/hooks/data_source/test_database.pywith a configured database that uses your fancy newcontainer:configuration.Oh, I think I just figured out your Docker-in-Docker point: You're actually running
dockerandpodmanexecutables, and so the test code would need to do that too. Hmm...Okay, it's less useful of a test, but you know what I recommend here? Don't run actual
dockerandpodmanexecutables in test. Instead, run fake executables that just return the expected JSON in the expected format... Which is much more feasible to actually run in a container.This is exactly what's already done for the filesystem hook commands, because those can't easily run in a container either: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/tests/end-to-end/commands ... See here for an example usage of them: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/tests/end-to-end/hooks/data_source/test_zfs.py#L25
The main challenge I can see here is that you haven't yet implemented a way to override the
dockerorpodmanpath... 😄Thank you for the explanation of the tests. That did help a lot.
You ain't going to get it unless you force me. I am a strong believer in convention over configuration. Sure let's add it if there is real demand but I won't add it for now. As far as the tests are concerned: I won't need an option to change the path.
As a sidenote: You can actually already change the path. It is called
PATHand works just fine thanks to the usage ofshutil.which(sorry for the "tongue in cheek" wording, I just could not resist). So if it turns out that I would need a different binary for the tests I would just create a folder for the tests in question, but apodman/dockerexecutable there and execute the test with thePATHset to that.Does that eliminate your concerns about supporting a
docker_commandoption for now? Granted, modifying thePATHto add your wrappers is a bit more work but not something that is uncommon or unheard of on *nix systems.@ -1323,6 +1323,23 @@ properties:implicitly enables read_special (see above) to supportdump and restore streaming.example: userslabel:This got me thinking: There is so much duplication here, would you be open to use yaml anchors etc (assuming the loading code can handle it)?
Sure, that'd be fine. There's actually tons of other schema duplication as well, so anchors might be a good thing to get used to here. But please make sure to test
borgmatic config generate/config validateif you do make this change. I have some vague recollection that I tried this once and it broke something, but that might've been years ago and no longer relevant.Okay, finally got to a state where I have initial end-to-end tests as well (at least for simple safe & restore). Mind looking at it and maybe provide some suggestions on how to keep the end to end tests small (countwise) since they take long. Would you mind if I adjust the config so half the databases test via
hostnameand the others viacontainer?I'm totally fine with that for now. But if there's one constant with borgmatic users, it's that they want everything to be configurable in borgmatic's configuration file rather than, say, just command-line flags or environment variables. 😄
I think it would be good to have both for each database type, but I don't think you need both for every database config. So for instance if you had the following, I think that would be sufficient:
I think you can probably also omit the
containervariant entirely fromwrite_custom_restore_configuration(). You're already testing acontainerrestore with the previous e2e tests, so you shouldn't need to do it again here.Hi @witten, I improved a few tests since just asserting that a
ValueErroris raised doesn't really check if the correctValueErroris raised which made refactoring harder (due to all the mocking). I think I have the tests behaving now as they should.I am struggling with one thing though. If I run something like:
to restore to container
asdI get:This is because the schema has a top-level configuration named
container: https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/config/schema.yaml#L2932 -- I am not sure how I could work around that, any ideas?8c1e67437caaecd0e5b6aaecd0e5b67620399eecSlidge nudge @witten for the off-chance that you didn't get a notify for #1121 (comment) -- would love to bring this over the finish line.
Thanks for the nudge.. I'm way behind on borgmatic stuff! As for the issue you've encountered, yeah, that's pretty obnoxious. I can think of a few classes of solutions, none of them great:
--containerto something else to avoid the conflict.container:to something else. Normally however this would require a lengthy backwards-compatibility phase, which would kind of defeat the purpose of the rename in the short term. A breaking change rename though would solve that—at the risk of upsetting some users.--containeroverride of thecontainer:option will no longer ever work when therestoreaction is used.--containerflag is colliding with thecontainer:option—which is oftype: objectin the configuration schema. Maybe overridingcontainer:by itself (instead of, say,container.secrets_directory) isn't even a valid use case? Maybe if a flag like--containermatches a piece of the schema oftype: object, we can ignore it for purposes of a config override? Or are there legitimate use cases where we want users to be able to specify a map on the command-line like:--container={'secrets_directory': '/path'}? There is an existing use case of maps as list elements, as described here: https://torsion.org/borgmatic/docs/how-to/make-per-application-backups/#configuration-overridesLet me know if you'd like me to look into any of these to help you out.
For option number 4, here's one way you could implement it...
Change this line in
config/arguments.pyfrom this:That would prevent
--container(or similar flags) from matching any option of typeobjectin the schema. And I don't think this breaks thearray-of-objectcase from the documentation, because in that situation theobjectisn't at the top level and therefore wouldn't get caught by theifcheck above.I can rationalize this approach to myself because the command-line override feature really isn't intended for setting entire complex objects from the command line (with the exception of list elements). Rather, the user is intended to
--dot.into.the.option.they.wantand override it that way.7620399eec099eab4c50WIP: Container database backupsto Container database backupsThanks, option 4 seems to work nicely. I removed the WIP status of this PR, it is rather big as is and I hope we don't need many further changes. Let me know what you think.
EDIT:// if the tests fail that must be due to the latest rebase in which case I will fix them tomorrow. But all in all this PR should be in a reviewable state.
Partial review just to get you something sooner rather than later. I'll get to the rest as soon as I get a chance!
@ -63,2 +70,3 @@if port:if label is not UNSPECIFIED:metadata = f'{name}@{label}'I wonder if, for this case, it'd be useful to still show hostname and port but then maybe show the label in parens or something.
Do not feel strongly.
Is that even possible? I mean the dump metadata comes from the filesystem directory structure where you have
postgres_databases/<host>:<port>/<db>orpostgres_databases/<label>/<db>, so you wouldn't have access to host & port at all in this case, or am I missing something here.I think not all callers to
render_dump_metadata()pull theDumpobjects from directory names. SomeDumps come from other sources, like what the user requested to restore on the command-line.In any case, see the other comment thread about potentially encoding both
hostname/portandlabelin the directory name. That would allowrender_dump_metadata()to receive both, even in the case that theDumpis coming from a directory name.But let me know if, for instance, you think from a UX perspective it'd be better to always hide the hostname/port when
labelis set. I can see that being a legitimate argument when the label comes from a container name, but less so when it's just ahostname/portwith a configuredlabel.@ -315,3 +331,3 @@port = Nonedumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port))dumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port, host_and_port))What's the thinking with setting the
labelfield to thehost_and_portvalue? I'm wondering if that would cause some of theif dump.label: ...code paths above to get triggered when they wouldn't otherwise.Well
host_and_portis either<host>:<port>or<label>, unless we somehow "tag" the directory name like@<label>there is no way for use to know whether<host>:<port>is ment to be host & port or a label namedhost:portGotcha. Any reason not to stick the label in the directory name as you describe? Seems maybe a little cleaner / more explicit.
The main downside I can see is that if you upgrade borgmatic, run
createwith alabeled database, downgrade borgmatic, and then runrestore, thelabelwill get parsed from the directory name as part of the port. But that seems like a pretty minor edge case.Mhm, the only reason was me being lazy. I mean it would not just be the label but also the container name that we need to stick into the directory name if we want that to be reversible. Coming up with sensible naming though might get hard? Is there a limit which characters borg allows in a filename? if not something like:
<host>:<port>@<label>c::<container>:<port>@<label>would work where
@<label>is only there if the label is set at all (backwards compat) andc::is needed to distinguish between container & host -- open to better ideas though 🤷♂️And sticking the container name back into the
Dumpobject will be hilarious for sure since it will require touching all tests again, but I guess that is what I get for suggesting the feature in the first place 😆Good point.
Indeed. Okay, just brainstorming here, so feel free to shoot any of this down...
I know I was the one who suggested it, but it's starting to feel like we're encoding a whole lot of info into the dump path. The purpose of the original
host:portpath component was to effectively namespace dumps of same-named databases, e.g. so you could have separatefoodatabase dumps from multiple different hosts and have those dumps not collide on the filesystem (and in the archive).With the introduction of label and container name though, the "identity" expressed by a simple hostname and port pair is no longer sufficient. So—hear me out—what if we instead serialized the entire
Dumpobject to a JSON file? Easy to write and easy to read. That could live alongside a similarly named database dump file. So for instance if you've got afoodatabase dump filename, there could be afoo.jsonorfoo.metadataor whatever right alongside it that contains something like:So what about the parent directory name? Well we no longer need to encode any data into the path, so it could just be a hash of that
Dumpdata structure or something similar. (With the caveat, to your earlier point in a source code comment, thatlabelitself is effectively a unique identifier if set.) There is however the use case of a user manually restoring a database dump from an archive, so maybe it would be best if the path component was stillhost:portbut with a unique hash suffix or something tacked onto it. That way a user can still find a dump they're looking for. Alternatively, you could literally callrender_dump_metadata()(or a similar function) and stick the result of that into the directory name, since this info is now for the user instead of the computer.Again, just brainstorming here. If this feels like too much scope creep, let me know.
Yes that would certainly make sense and was already at the back of my head. I am just a little bit worried that the PR stays reviewable, it is already rather large. But if you are fine with it I guess we could add something like that.
While that would probably work I am not sure how I feel about directory names containing special characters, whitespaces and what not :D
only in theory though since nothing enforces this currently. Would have to check how borgmatic handles that currently
Honestly I'd do it as a totally separate PR for that very reason. It could even be based on main since I don't think it inherently needs anything
labelorcontainerspecific. It's "just" writing out dump metadata to a new location and reading it from a new location. (And probing / falling back to the path-based metadata for backwards compatibility.)And if you don't want to do it, I'd be happy to take a stab at it. You're also of course welcome to do it yourself as a separate PR or as part of this PR if you'd prefer. But I think it'd be easier to implement and review separately.
Yeah, that's why I said "or a similar function." I agree that spaces and parens probably shouldn't be in the path.
I won't get to it this (and probably not next) week, so if you find some spare time please give it a try. 👍
#1136
Okay, have at it!
@ -1488,2 +1492,4 @@help='Path to restore SQLite database dumps to. Defaults to the "restore_path" option in borgmatic\'s configuration',)restore_group.add_argument('--original-label',Worth also having an
--original-container?Can do, but since
containerin the config is just set aslabelin the resulting directory structure (since we don't safe metadata but solely have that information in the path) it is always the label that is used… You can even do--original-label=host:portinstead of--original-host=host --original-port=port.The two flags (
--original-labeland--original-container) could even be aliases for one another.Makes me wonder though if at some point (not in this PR), we should store database metadata a little more formally than encoding it in the path.
@ -1489,1 +1493,4 @@)restore_group.add_argument('--original-label',help='The label where to dump to restore came from, only necessary if you need to disambiguate dumps',Nit: "to" -> "the".
@ -151,1 +149,3 @@if option_type is None:# The argument doesn't correspond to any option in the schema, or it is a complex argument, so ignore it.# It's probably a flag that borgmatic has on the command-line but not in configuration.if option_type in {'object', None}:It'd be great to have a test update for this change. The tests are in
tests/unit/config/test_arguments.pyand ones to look at for examples might betest_prepare_arguments_for_config_skips_option_with_none_value()ortest_prepare_arguments_for_config_skips_option_missing_from_schema().@ -0,0 +10,4 @@logger = logging.getLogger(__name__)def resolve_database_option(option, data_source, connection_params=None, restore=False):It'd be good to have docstrings for these functions, ideally describing what they do, what arguments they expect, what they return, what they error (if any), etc.
@ -0,0 +21,4 @@return data_source.get(option)def _get_hostname_from_config(data_source, connection_params=None, restore=False):Reminder: This codebase generally doesn't use underscore-prefixed function names.
Do not feel strongly.
@ -225,0 +224,4 @@hostname = utils.resolve_database_option('hostname', database, restore=use_restore_options)port = utils.resolve_database_option('port', database, restore=use_restore_options)username = utils.resolve_database_option('username', database, restore=use_restore_options)password = utils.resolve_database_option('password', database, restore=use_restore_options)End-to-end tests generally (always?) avoid using any borgmatic code except by invoking borgmatic on the command-line. It sort of stems from the practice of not using code under test as part of the test logic itself. And in the case of an end-to-end test, pretty much all of borgmatic is under test.
Anyway, I see why you're doing this here, at least for
hostname. For that one, could you just use the container name, since the test container and the database container happen to be running on the same Docker network?Makes sense, will fix this once I address the other points.
@ -659,0 +669,4 @@original_working_directory = os.getcwd()original_path = os.environ.get('PATH', '')os.environ['PATH'] = f'/app/tests/end-to-end/commands:{original_path}'Nice.
Another partial review!
@ -0,0 +13,4 @@def resolve_database_option(option, data_source, connection_params=None, restore=False):# Special case `hostname` since it overlaps with `container`if option == 'hostname':return _get_hostname_from_config(data_source, connection_params, restore)Given that all callers to
resolve_database_option()are passing inoptionas a string literal (as opposed to a variable), what do you think of just having a separateget_hostname_from_config()orresolve_hostname_option()that's called directly rather than going throughresolve_database_option()for the special hostname case? It just might let you lose one level of indirection.Do not feel strongly.
I just tried that, but I rather not for the simple reason that the you'd now have to know from the outside that
hostnameneeds specialcasing, if you mistakenly still call this function withhostnameyou'd get the wrong result. So I'd still need anifto check forhostnameand then error out… So I rather give the user a consistent interface.Fair enough!
@ -0,0 +37,4 @@# ... and finally fall back to the normal optionsif 'container' in data_source:return get_ip_from_container(data_source['container'])return data_source.get('hostname', '')Might be nice to use
Noneinstead of''for the case of no known hostname? And then if a caller wants to interpret that missing hostname as'', say for comparison purposes, it can of course do that.@ -0,0 +71,4 @@main_ip = network_data.get('IPAddress')if main_ip:return main_ip# No main IP found, try the networksAh clever.
@ -0,0 +74,4 @@# No main IP found, try the networksfor network in network_data.get('Networks', {}).values():ip = network.get('IPAddress')if ip:Could use your fancy walrus operator here. 😄 Above too.
@ -122,6 +123,7 @@ def database_names_to_dump(database, config, username, password, environment, dr)extra_options, defaults_extra_filename = parse_extra_options(database.get('list_options'))password_transport = database.get('password_transport', 'pipe')hostname = ds_config.resolve_database_option('hostname', database)Instead of
ds_config, you could call thisdatabase_configsince it's only used for databases and not other types of data sources (filesystems, etc.).@ -365,3 +369,3 @@return (dump.make_data_source_dump_filename(make_dump_path('borgmatic'), name, hostname='*'),dump.make_data_source_dump_filename(make_dump_path('borgmatic'), name, label='*'),I think this change has the side effect of dropping the port from the produced dump pattern. Is that intentional? It might mean that the pattern matches more paths than it did when the port was included.
In theory you are right, in practice if you look at how
make_data_source_dump_filenameis implemented it wouldn't add a port anyways since for this specific call it isNone. So it doesn't really matter if thelabelor thehostnameis set to*, the resulting pattern will be*in both cases.Got it. Thanks for explaining.
@ -407,3 +407,1 @@connection_params['username']or data_source.get('restore_username', data_source.get('username'))),ds_config.resolve_database_option('username', data_source, connection_params, restore=True),FWIW, I like the way you've pulled out a bunch of this logic into a centralized utility function. So much DRYer!
@ -49,2 +36,2 @@except (AttributeError, KeyError):passpassword = ds_config.resolve_database_option('password', database, restore_connection_params, restore=restore_connection_paramsYou might consider slapping a
bool()around therestore_connection_paramspassed torestore=.Mhm feels a bit un-pythonic to me, so unless you feel strongly I would leave it as is.
I don't feel strongly.
@ -347,2 +338,3 @@data_source['name'],data_source.get('hostname'),hostname=data_source.get('hostname'),port=data_source.get('port'),Should these be pulling from the
hostnameandportvariables from above instead of looking indata_source?236c2c102da58dce9ee7@witten I think I am done with this PR :) My local testing worked fine. While the end to end tests also work I would appreciate if you could take it through a manual testrun as well -- not that I am missing something.
Partial review! (About half of the PR.) I think this is just about ready to merge from what I've seen.
I've also done some sporadic manual tests: PostgreSQL, SQLite, dumps, restores, various permutations of
hostname,container,label, etc. Everything looks good so far.@ -176,0 +188,4 @@data_source['name'],data_source.get('hostname'),data_source.get('port'),data_source.get('label') or data_source.get('container') or UNSPECIFIED,Shouldn't this parcel out
labelandcontainerinto separate fields now?I might be missing something, but why not pass
containeras well here?Because I f'ed it up :/
Oh, no worries!
@ -372,3 +389,3 @@port = Nonedumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port))dumps_from_archive.add(Dump(hook_name, data_source_name, hostname, port, host_and_port))I may have already asked this, but: Why add
host_and_portas thelabel? I'd imagine that you could just leavelabelblank.Yes, this is still from the old container code (same for the above)
@ -326,6 +333,8 @@ def dump_data_sources(database['name'],database.get('hostname', 'localhost'),I've noticed that this ends up setting the
hostnametolocalhosteven whenhostnameisn't configured and onlycontaineris set. Seems wrong/misleading... Do you think it's worth worrying about?Do not feel strongly.
I noticed that as well, will try to remove it and see if any tests break horribly.
I have reverted the localhost changes as I am running from test error to test error :/ I would recommend fixing them in a follow up PR if wanted. It would especially be useful if 'localhost' would get removed from the
Dumpdefaults, because that also has far reaching consequences.Okay, no problem. Let me know if you'd like me to tackle that or if you'd rather do it (after this PR).
@ -6,6 +6,7 @@ import borgmatic.borg.patternimport borgmatic.config.pathsimport borgmatic.hooks.credential.parsefrom borgmatic.execute import execute_command, execute_command_with_processesfrom borgmatic.hooks.data_source import config as ds_configCould be
database_configas well.@ -12,6 +12,7 @@ from borgmatic.execute import (execute_command_and_capture_output,execute_command_with_processes,)from borgmatic.hooks.data_source import config as ds_configSame here.
FIRST OF ALL: Thank you so much for all of the time and effort you put into this PR. I know it's probably/definitely taken longer than you expected (which was in no small part entirely my fault), and so I really appreciate you sticking with this.
I do have some minor bits of feedback here, because of course I do, BUT in the interests of: 1. moving things along, and 2. giving you a break... I'll take care of these minor items myself after I merge this. The one exception is the localhost thing (see comments), which I will also take care of after I merge this—unless you'd prefer to take care of it yourself after this is merged. Just let me know either way.
Onward!
@ -241,3 +228,2 @@Alter the ports in these examples to suit your particular database system.Now borgmatic will use the `docker`/`podman` CLI to figure out the container IP.It's heavily implied, but is it worth saying that
container:will only work on the host? Like, I can see someone assuming that they can usecontainer:even when borgmatic is running in a container because, after all, it is connecting to a database in a container. (Which incidentally might be a nice future addition.)I am planning to use borgmatic inside a container to dump other containers. As long as the docker socket is mapped in and the docker CLI installed, this should just work like on the host itself.
@ -0,0 +6,4 @@from borgmatic.hooks.data_source import configdef test_get_database_option():Nit: Should be named
test_resolve_database_option()given the name of the unit under test.@ -0,0 +19,4 @@assert (config.resolve_database_option('option', data_source, connection_params, restore=True)== 'connection_value')I'd suggest making these separate tests (or test cases) so as not to mask test failures when the first one fails. Separate tests also allows very clearly spelling out expectations in the name of each test.
@ -0,0 +22,4 @@)def test_get_hostname_option_via_container():Similar feedback on this test and some others below.
No worries, we are all doing this in our free time and I highly appreciate your reviews -- after all (for it's size) we got this in relatively fast imo.
I really wouldn't mind if you want to take care of that one, I cannot see mocks anymore :D
Haha I hear you. I'll see what I can do. (No promises though, because I may get to a similar point...)
I think I kinda maybe did it?
a388b17271That looks good indeed.