Adding influxdb_support #1142
Reference in New Issue
Block a user
Delete Branch "jolcese/borgmatic:influxdb_support"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR builds on top of #1058, authored by @gautamaggarwal2810 which seems to be abandoned.
Code here addresses most of the comments on the original PR with exception of the
restore_*attributes. At this time uses the same attributes for backup and restore. Future commits will improve that.Also, the
nameandformatparameters used across database functions (i.e.dump.make_data_source_dump_filename) do not exactly match the InfluxDB concepts. Right now,nameshall beallandformatshall bedirectoryinfluxdb_supportto Adding influxdb_supportThanks for taking this on! I will have a look as soon as I get a chance.
While
nameis an inherent feature of borgmatic's database machinery and therefore is required,formatisn't. So ifformatdoesn't make sense for InfluxDB (or if you don't need it right now but might in the future), feel free to drop it entirely from the PR. Up to you!Actually I lied. Looking at
borgmatic/actions/restore.py, there is a bit of logic there that looks for aformatofdirectory. So I guess keep it if you're relying on that restore logic!Partial review, just on the schema.
@@ -1309,0 +1311,4 @@items:type: objectrequired:- tokennameshould be in here too, as borgmatic's dump and restore machinery relies on it being present.Added
nameandformatas both are needed right now.@@ -1309,0 +1318,4 @@type: stringenum: ['all']description: |Database name (required to be 'all').I recommend not requiring this value to be
all, as a user might want to back up multiple different InfluxDB instances on different hosts—in which case both entries can't have the same name.Under the hood though, you can still treat them as dumping "all" of the the InfluxDB contents for a given instance.
The problem I have is that when removing
allfrom the configuration, https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/actions/restore.py#L373-L380 does not trigger and then files are not restored. Maybe I don't completely understand the internal logic but I don't see other way to solve it.So here's my recollection of the purpose of that logic:
alldata sources, e.g. viaborgmatic restore --data-source allor even justborgmatic restorewithout specifying any--data-source, then borgmatic restores each and every data source dump found in the archive.borgmatic restore --data-source mydb, then borgmatic plucks out that single named data source dump from the archive and restores just that.So unless I'm missing something, you should be able to give your data source a
nameother thanalland then either restore it along withalldata sources in the archive or restore it by itself by specifying a single--data-source.If you're finding that that's not the case, please let me know!
@@ -1309,0 +1330,4 @@type: stringdescription: |The InfluxDB executable to use. Defaults to `influx`.token:What do you think of calling this
passwordinstead oftoken? I know that's not in line with the InfluxDB concept, but borgmatic'srestoreaction has a--passwordflag that could line up with apassword(or ultimatelyrestore_password) option here.The description could mention that it's actually an InfluxDB API token.
Changed to password.
@@ -1309,0 +1344,4 @@type: stringdescription: |InfluxDB hostname/IP to connect to.Defaults to 127.0.0.1.I'd suggest
localhostinstead of127.0.0.1here, even if it's functionally the same thing.Changed to
localhost@@ -1309,0 +1360,4 @@type: stringdescription: |The name of the organization.If provided, do not use organization_id.The description here seems to conflict with that of
organization_idabove. If both are provided, which one is used?Clarified that
organization_idtakes precedence overorganization_name@@ -1309,0 +1361,4 @@description: |The name of the organization.If provided, do not use organization_id.bucket_name:Here's a thought: What if
bucket_namewas dropped in favor of thenameoption, such thatnameserved the same purpose? I know that an InfluxDB bucket isn't 100% comparable to a SQL database, but maybe it's close enough? That would allow the existing borgmatic machinery andrestoreCLI to line up a little better with this hook. For instance, with that change, a user couldborgmatic restore --data-source [bucketname]if they wanted to restore a single bucket.That does still leave the
bucket_idoption untouched (and therefore not directly supported by therestoreaction), but maybe that's okay?@@ -1309,0 +1370,4 @@type: stringdescription: |The ID of the bucket to backup.If provided, do not use bucket_name.Similar conflict here. If both are provided, which one "wins"?
Clarified that
bucket_idtakes precedence overbucket_name@@ -1309,0 +1396,4 @@type: booleandescription: |Enable debug logging for HTTP requests.full:Given how this sounds like this could cause data loss, I'd suggest calling this something more specific like
full_replaceor whatever.Changed to
full_replaceThis is looking good! Another partial review.
@@ -0,0 +37,4 @@dry_run,):'''Dump the given InfluxDB databases to a named pipe. The databases are supplied as a sequence ofLooks like this comment doesn't correspond to the code, as you're not dumping to a named pipe and are instead dumping to a directory?
@@ -0,0 +43,4 @@log prefix in any log entries).Return a sequence of subprocess.Popen instances for the dump processes ready to spew to a namedpipe. But if this is a dry run, then don't actually dump anything and return an empty sequence.This comment about the processes returned also appears to be inaccurate now.
@@ -0,0 +51,4 @@logger.info(f'Dumping InfluxDB databases{dry_run_label}')processes = []processesdoesn't look like it's used except to return an empty list. So for clarity, you could just return an empty list literal below and remove this variable here.@@ -0,0 +71,4 @@logger.debug(f'Command: {command}',)I don't think this is necessary, because
execute_command()should log the command run...?@@ -0,0 +74,4 @@)dump.create_parent_directory_for_dump(dump_filename)execute_command(command, shell=True) # noqa: S604Looking at the contents of
build_dump_command(), I don't immediately see anything that requires shell access. So you may be able to removeshell=Trueand the correspondingnoqahere—but that may mean you'd also have to remove the shell quoting.Do not feel strongly if you'd rather keep
shell=True.@@ -0,0 +89,4 @@def build_dump_command(database, dump_filename):'''Return the backup command.In general, it's useful to describe the expected arguments and return value. (See the previous function for an example.)
@@ -0,0 +98,4 @@# Add protocol prefix based on tls settingprotocol = 'https://' if database.get('tls', True) else 'http://'# Format as protocol://hostname:porthost = f'{protocol}{host}:{port}'I would suggest maybe not replacing the value of the
hostvariable since it starts off in one format (just hostname) and then ends up in another format (a full URL). Using a new variable name would address that concern.Do not feel strongly.
@@ -0,0 +111,4 @@+ ('backup',)+ (('--skip-verify',) if skip_verify else ())+ (('--http-debug',) if http_debug else ())+ (('--host', shlex.quote(str(host))) if host else ())I don't think you need all these
str()calls, because the schema should take care of making sure that strings are strings. The only except I can see is where the schema asks for an integer, say.@@ -0,0 +162,4 @@databases, config, borgmatic_runtime_directory, name=None): # pragma: no cover'''Given a sequence of configurations dicts, a configuration dict, a prefix to log with, theLooks like this comment doesn't quite correspond to the given arguments.
@@ -0,0 +175,4 @@),dump.make_data_source_dump_filename(make_dump_path(borgmatic_source_directory), name, hostname='*'),You can safely remove this last one (the glob pattern made from the borgmatic source directory), because there will never be a release of borgmatic using the deprecated source directory that also created archives containing InfluxDB dumps.
You still need the borgmatic runtime directory glob though, because that supports the use of Borg 1.2. (Long story.)
@@ -0,0 +191,4 @@'''Restore a database from the given extract stream. The database is supplied as a data sourceconfiguration dict, but the given hook configuration is ignored. The given configuration dict isused to construct the destination path, and the given log prefix is used for any log entries. IfThis comment also appears a little out of date.
@@ -0,0 +193,4 @@configuration dict, but the given hook configuration is ignored. The given configuration dict isused to construct the destination path, and the given log prefix is used for any log entries. Ifthis is a dry run, then don't actually restore anything. Trigger the given active extractprocess (an instance of subprocess.Popen) to produce output to consume.This might be inaccurate too, given that you're only doing directory format restores?
@@ -0,0 +219,4 @@# if the restore paths don't exist in the archive.execute_command_with_processes(restore_command,[extract_process] if extract_process else [],Given that you're only doing directory format restores, I wouldn't expect
extract_processto be used ever (here or two lines below).Do not feel strongly.
@@ -0,0 +237,4 @@# Add protocol prefix based on tls settingprotocol = 'https://' if database.get('tls', True) else 'http://'# Format as protocol://hostname:porthost = f'{protocol}{host}:{port}'Same thought here about variable reuse as above.
@@ -0,0 +252,4 @@http_debug = database.get('http_debug')full = database.get('full_replace')influx_command = tuple(shlex.quote(part) for part in shlex.split(database.get('influx_command') or 'influx')Given that you're not using
shell=Truefor invoking this restore command, I'm not sure this quoting will actually work. You can experiment by trying to set aninflux_commandthat contains a space, for instance. (Example:influx_command: influx --skip-verify.)Actually, I take that back. I think it'll work even without
shell=True.Test comments!
@@ -105,0 +109,4 @@port: 8087tls: trueskip_verify: truetojen: mysecrettokenLooks like this out of date now.
@@ -0,0 +4,4 @@def test_make_dump_path_creates_correct_path():assert module.make_dump_path('/tmp') == '/tmp/influxdb_databases'As written, this is more of an integration test than a unit test, as it integrates multiple different units. If you want to keep it, I'd recommend moving it into the appropriate location in
tests/integration/...However, given that there's really no logic to speak of in this particular unit, and you've already got a
pragma: no coveron it, I think it's safe to delete this test.@@ -0,0 +8,4 @@def test_get_default_port_returns_correct_port():assert module.get_default_port(None, None) == 8086You also don't need this test given that there's no logic in the unit and you've got
pragma: no coveron it.Don't feel strongly.
@@ -0,0 +13,4 @@def test_use_streaming_always_returns_false():assert not module.use_streaming(databases=[], config={})assert not module.use_streaming(databases=[{'name': 'bucket1'}], config={})Similar here: There's no logic in the unit, so IMO you could just slap a
pragma: no coveron it and lose the test.@@ -0,0 +46,4 @@'--bucket','TestBucket','/tmp/dumpfile',)Nice test!
@@ -0,0 +74,4 @@)def test_build_dump_command_with_http_disabled():Might be a little clearer to call this
..._with_tls_disabled().@@ -0,0 +236,4 @@'--token','testtoken','/tmp/dumpfile',)All of these test variants for
build_dump_command()are great. The only things I see potentially missing are tests for:portmissing (in which caseget_default_port()would probably need to be mocked out)influx_commandset with a value containing spaces, e.g. with the use of a flag or two (could just be part of the previous test)skip_verifysetpasswordset@@ -0,0 +276,4 @@'mytoken','--org-id','org123',),Given that, in this test, you're in control of what
make_dump_command()returns, it could literally return aflexmock()instance rather than an actual command. For instance, the test would pass even if the command is('totally', 'not', 'influxdb')(assuming it's in both places).Do not feel strongly. It's just a way to simplify the test.
@@ -0,0 +289,4 @@dry_run=False,)assert len(processes) == 0 # No subprocess.Popen instances are returned.Nice use of mocking in this test. The one thing missing to my eye is an assert on
patterns, as this unit under test definitely modifies it.It might also be nice to make this test deal with two different databases, just to ensure the looping and appending works as expected. Do not feel strongly about that.
@@ -0,0 +309,4 @@dry_run=True,)assert len(processes) == 0You could assert on
patternshere too.@@ -0,0 +327,4 @@'port': '8086', # Changed to string to avoid TypeError'password': 'mytoken',}extract_process = flexmock(stdout=flexmock())I'm pretty sure that
extract_processis never set for this hook becauseuse_streamingis alwaysFalse.@@ -0,0 +340,4 @@)flexmock(module).should_receive('execute_command_with_processes').with_args(('influx', 'restore', '--host', 'https://localhost:8086', '--token', 'mytoken'),Similar feedback here as above. This command could totally just be a
flexmock()instance. Do not feel strongly.@@ -0,0 +724,4 @@'--token','mytoken','/tmp/dumpfile',)Another good set of test variants! Potentially missing tests though:
portmissing (in which caseget_default_port()would probably need to be mocked out)influx_commandset with a value containing spaces, e.g. with the use of a flag or two (could just be part of the previous test)tlsset@@ -0,0 +737,4 @@config={},borgmatic_runtime_directory='/tmp',dry_run=False,)IMO this test could be deleted given that you have a
pramga: no coverand zero logic. Do not feel strongly.View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.