Adding influxdb_support #1142

Open
jolcese wants to merge 15 commits from jolcese/borgmatic:influxdb_support into main
First-time contributor

This PR builds on top of #1058, authored by @gautamaggarwal2810 which seems to be abandoned.

Code here addresses most of the comments on the original PR with exception of the restore_* attributes. At this time uses the same attributes for backup and restore. Future commits will improve that.
Also, the name and format parameters used across database functions (i.e. dump.make_data_source_dump_filename) do not exactly match the InfluxDB concepts. Right now, name shall be all and format shall be directory

This PR builds on top of https://projects.torsion.org/borgmatic-collective/borgmatic/pulls/1058, authored by @gautamaggarwal2810 which seems to be abandoned. Code here addresses most of the comments on the original PR with exception of the `restore_*` attributes. At this time uses the same attributes for backup and restore. Future commits will improve that. Also, the `name` and `format` parameters used across database functions (i.e. `dump.make_data_source_dump_filename`) do not exactly match the InfluxDB concepts. Right now, `name` shall be `all` and `format` shall be `directory`
jolcese added 3 commits 2025-09-06 01:31:30 +00:00
Linter fixes
All checks were successful
build / test (pull_request) Successful in 6m38s
build / docs (pull_request) Has been skipped
6335d17461
jolcese added 1 commit 2025-09-06 02:34:08 +00:00
Some fixes
Some checks failed
build / test (pull_request) Failing after 1m35s
build / docs (pull_request) Has been skipped
36ca79575c
jolcese added 1 commit 2025-09-08 19:05:42 +00:00
Working backup & restore
Some checks failed
build / test (pull_request) Failing after 1m39s
build / docs (pull_request) Has been skipped
5ccff1fed0
jolcese added 2 commits 2025-09-08 19:43:57 +00:00
Cleaned test cases
Some checks failed
build / test (pull_request) Failing after 1m59s
build / docs (pull_request) Has been skipped
3348350154
jolcese added 1 commit 2025-09-08 19:54:55 +00:00
Fix exception
All checks were successful
build / test (pull_request) Successful in 6m35s
build / docs (pull_request) Has been skipped
3a5635ee78
jolcese added 1 commit 2025-09-08 19:55:44 +00:00
Merge branch 'main' into influxdb_support
Some checks failed
build / test (pull_request) Failing after 1m48s
build / docs (pull_request) Has been skipped
d8d4506f1a
jolcese added 2 commits 2025-09-08 19:57:47 +00:00
Merge branch 'influxdb_support' of ssh://projects.torsion.org:3022/jolcese/borgmatic into influxdb_support
Some checks failed
build / test (pull_request) Failing after 1m50s
build / docs (pull_request) Has been skipped
367e003e99
jolcese added 1 commit 2025-09-09 03:41:10 +00:00
Removed unneeded comment
Some checks failed
build / test (pull_request) Failing after 1m58s
build / docs (pull_request) Has been skipped
955487b154
jolcese added 1 commit 2025-09-09 03:43:21 +00:00
Ruff fix
All checks were successful
build / test (pull_request) Successful in 6m39s
build / docs (pull_request) Has been skipped
1c9e3a9e44
jolcese added 1 commit 2025-09-09 04:05:18 +00:00
Documentation example
All checks were successful
build / test (pull_request) Successful in 6m36s
build / docs (pull_request) Has been skipped
90b44e5dfa
jolcese changed title from influxdb_support to Adding influxdb_support 2025-09-09 16:08:48 +00:00
Owner

Thanks for taking this on! I will have a look as soon as I get a chance.

Also, the name and format parameters used across database functions (i.e. dump.make_data_source_dump_filename) do not exactly match the InfluxDB concepts. Right now, name shall be all and format shall be directory

While name is an inherent feature of borgmatic's database machinery and therefore is required, format isn't. So if format doesn't make sense for InfluxDB (or if you don't need it right now but might in the future), feel free to drop it entirely from the PR. Up to you!

Thanks for taking this on! I will have a look as soon as I get a chance. > Also, the `name` and `format` parameters used across database functions (i.e. `dump.make_data_source_dump_filename`) do not exactly match the InfluxDB concepts. Right now, `name` shall be `all` and `format` shall be `directory` While `name` is an inherent feature of borgmatic's database machinery and therefore is required, `format` isn't. So if `format` doesn't make sense for InfluxDB (or if you don't need it right now but might in the future), feel free to drop it entirely from the PR. Up to you!
Owner

Actually I lied. Looking at borgmatic/actions/restore.py, there is a bit of logic there that looks for a format of directory. So I guess keep it if you're relying on that restore logic!

Actually I lied. Looking at `borgmatic/actions/restore.py`, there *is* a bit of logic there that looks for a `format` of `directory`. So I guess keep it if you're relying on that restore logic!
witten reviewed 2025-09-09 18:15:55 +00:00
witten left a comment
Owner

Partial review, just on the schema.

Partial review, just on the schema.
@@ -1309,0 +1311,4 @@
items:
type: object
required:
- token
Owner

name should be in here too, as borgmatic's dump and restore machinery relies on it being present.

`name` should be in here too, as borgmatic's dump and restore machinery relies on it being present.
Author
First-time contributor

Added name and format as both are needed right now.

Added `name` and `format` as both are needed right now.
witten marked this conversation as resolved
@@ -1309,0 +1318,4 @@
type: string
enum: ['all']
description: |
Database name (required to be 'all').
Owner

I recommend not requiring this value to be all, as a user might want to back up multiple different InfluxDB instances on different hosts—in which case both entries can't have the same name.

Under the hood though, you can still treat them as dumping "all" of the the InfluxDB contents for a given instance.

I recommend not requiring this value to be `all`, as a user might want to back up multiple different InfluxDB instances on different hosts—in which case both entries can't have the same name. Under the hood though, you can still treat them as dumping "all" of the the InfluxDB contents for a given instance.
Author
First-time contributor

The problem I have is that when removing all from the configuration, https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/actions/restore.py#L373-L380 does not trigger and then files are not restored. Maybe I don't completely understand the internal logic but I don't see other way to solve it.

The problem I have is that when removing `all` from the configuration, https://projects.torsion.org/borgmatic-collective/borgmatic/src/branch/main/borgmatic/actions/restore.py#L373-L380 does not trigger and then files are not restored. Maybe I don't completely understand the internal logic but I don't see other way to solve it.
Owner

So here's my recollection of the purpose of that logic:

  • If the user requests to restore all data sources, e.g. via borgmatic restore --data-source all or even just borgmatic restore without specifying any --data-source, then borgmatic restores each and every data source dump found in the archive.
  • But if the user requests to restore some named data source other than all, e.g. borgmatic restore --data-source mydb, then borgmatic plucks out that single named data source dump from the archive and restores just that.

So unless I'm missing something, you should be able to give your data source a name other than all and then either restore it along with all data sources in the archive or restore it by itself by specifying a single --data-source.

If you're finding that that's not the case, please let me know!

So here's my recollection of the purpose of that logic: * If the user requests to restore `all` data sources, e.g. via `borgmatic restore --data-source all` or even just `borgmatic restore` without specifying any `--data-source`, then borgmatic restores each and every data source dump found in the archive. * But if the user requests to restore some named data source other than all, e.g. `borgmatic restore --data-source mydb`, then borgmatic plucks out that single named data source dump from the archive and restores just that. So unless I'm missing something, you should be able to give your data source a `name` other than `all` and then either restore it along with `all` data sources in the archive or restore it by itself by specifying a single `--data-source`. If you're finding that that's *not* the case, please let me know!
@@ -1309,0 +1330,4 @@
type: string
description: |
The InfluxDB executable to use. Defaults to `influx`.
token:
Owner

What do you think of calling this password instead of token? I know that's not in line with the InfluxDB concept, but borgmatic's restore action has a --password flag that could line up with a password (or ultimately restore_password) option here.

The description could mention that it's actually an InfluxDB API token.

What do you think of calling this `password` instead of `token`? I know that's not in line with the InfluxDB concept, but borgmatic's `restore` action has a `--password` flag that could line up with a `password` (or ultimately `restore_password`) option here. The description could mention that it's actually an InfluxDB API token.
Author
First-time contributor

Changed to password.

Changed to password.
jolcese marked this conversation as resolved
@@ -1309,0 +1344,4 @@
type: string
description: |
InfluxDB hostname/IP to connect to.
Defaults to 127.0.0.1.
Owner

I'd suggest localhost instead of 127.0.0.1 here, even if it's functionally the same thing.

I'd suggest `localhost` instead of `127.0.0.1` here, even if it's functionally the same thing.
Author
First-time contributor

Changed to localhost

Changed to `localhost`
jolcese marked this conversation as resolved
@@ -1309,0 +1360,4 @@
type: string
description: |
The name of the organization.
If provided, do not use organization_id.
Owner

The description here seems to conflict with that of organization_id above. If both are provided, which one is used?

The description here seems to conflict with that of `organization_id` above. If both are provided, which one is used?
Author
First-time contributor

Clarified that organization_id takes precedence over organization_name

Clarified that `organization_id` takes precedence over `organization_name`
jolcese marked this conversation as resolved
@@ -1309,0 +1361,4 @@
description: |
The name of the organization.
If provided, do not use organization_id.
bucket_name:
Owner

Here's a thought: What if bucket_name was dropped in favor of the name option, such that name served the same purpose? I know that an InfluxDB bucket isn't 100% comparable to a SQL database, but maybe it's close enough? That would allow the existing borgmatic machinery and restore CLI to line up a little better with this hook. For instance, with that change, a user could borgmatic restore --data-source [bucketname] if they wanted to restore a single bucket.

That does still leave the bucket_id option untouched (and therefore not directly supported by the restore action), but maybe that's okay?

Here's a thought: What if `bucket_name` was dropped in favor of the `name` option, such that `name` served the same purpose? I know that an InfluxDB bucket isn't 100% comparable to a SQL database, but maybe it's close enough? That would allow the existing borgmatic machinery and `restore` CLI to line up a little better with this hook. For instance, with that change, a user could `borgmatic restore --data-source [bucketname]` if they wanted to restore a single bucket. That does still leave the `bucket_id` option untouched (and therefore not directly supported by the `restore` action), but maybe that's okay?
@@ -1309,0 +1370,4 @@
type: string
description: |
The ID of the bucket to backup.
If provided, do not use bucket_name.
Owner

Similar conflict here. If both are provided, which one "wins"?

Similar conflict here. If both are provided, which one "wins"?
Author
First-time contributor

Clarified that bucket_id takes precedence over bucket_name

Clarified that `bucket_id` takes precedence over `bucket_name`
jolcese marked this conversation as resolved
@@ -1309,0 +1396,4 @@
type: boolean
description: |
Enable debug logging for HTTP requests.
full:
Owner

Given how this sounds like this could cause data loss, I'd suggest calling this something more specific like full_replace or whatever.

Given how this sounds like this could cause data loss, I'd suggest calling this something more specific like `full_replace` or whatever.
Author
First-time contributor

Changed to full_replace

Changed to `full_replace`
jolcese marked this conversation as resolved
jolcese added 1 commit 2025-09-09 23:18:43 +00:00
Address PR comments
All checks were successful
build / test (pull_request) Successful in 6m51s
build / docs (pull_request) Has been skipped
ac0227d012
witten reviewed 2025-09-10 18:03:46 +00:00
witten left a comment
Owner

This is looking good! Another partial review.

This is looking good! Another partial review.
@@ -0,0 +37,4 @@
dry_run,
):
'''
Dump the given InfluxDB databases to a named pipe. The databases are supplied as a sequence of
Owner

Looks like this comment doesn't correspond to the code, as you're not dumping to a named pipe and are instead dumping to a directory?

Looks like this comment doesn't correspond to the code, as you're not dumping to a named pipe and are instead dumping to a directory?
@@ -0,0 +43,4 @@
log prefix in any log entries).
Return a sequence of subprocess.Popen instances for the dump processes ready to spew to a named
pipe. But if this is a dry run, then don't actually dump anything and return an empty sequence.
Owner

This comment about the processes returned also appears to be inaccurate now.

This comment about the processes returned also appears to be inaccurate now.
@@ -0,0 +51,4 @@
logger.info(f'Dumping InfluxDB databases{dry_run_label}')
processes = []
Owner

processes doesn't look like it's used except to return an empty list. So for clarity, you could just return an empty list literal below and remove this variable here.

`processes` doesn't look like it's used except to return an empty list. So for clarity, you could just return an empty list literal below and remove this variable here.
@@ -0,0 +71,4 @@
logger.debug(
f'Command: {command}',
)
Owner

I don't think this is necessary, because execute_command() should log the command run...?

I don't think this is necessary, because `execute_command()` should log the command run...?
@@ -0,0 +74,4 @@
)
dump.create_parent_directory_for_dump(dump_filename)
execute_command(command, shell=True) # noqa: S604
Owner

Looking at the contents of build_dump_command(), I don't immediately see anything that requires shell access. So you may be able to remove shell=True and the corresponding noqa here—but that may mean you'd also have to remove the shell quoting.

Do not feel strongly if you'd rather keep shell=True.

Looking at the contents of `build_dump_command()`, I don't immediately see anything that requires shell access. So you may be able to remove `shell=True` and the corresponding `noqa` here—but that may mean you'd also have to remove the shell quoting. Do not feel strongly if you'd rather keep `shell=True`.
@@ -0,0 +89,4 @@
def build_dump_command(database, dump_filename):
'''
Return the backup command.
Owner

In general, it's useful to describe the expected arguments and return value. (See the previous function for an example.)

In general, it's useful to describe the expected arguments and return value. (See the previous function for an example.)
@@ -0,0 +98,4 @@
# Add protocol prefix based on tls setting
protocol = 'https://' if database.get('tls', True) else 'http://'
# Format as protocol://hostname:port
host = f'{protocol}{host}:{port}'
Owner

I would suggest maybe not replacing the value of the host variable since it starts off in one format (just hostname) and then ends up in another format (a full URL). Using a new variable name would address that concern.

Do not feel strongly.

I would suggest maybe not replacing the value of the `host` variable since it starts off in one format (just hostname) and then ends up in another format (a full URL). Using a new variable name would address that concern. Do not feel strongly.
@@ -0,0 +111,4 @@
+ ('backup',)
+ (('--skip-verify',) if skip_verify else ())
+ (('--http-debug',) if http_debug else ())
+ (('--host', shlex.quote(str(host))) if host else ())
Owner

I don't think you need all these str() calls, because the schema should take care of making sure that strings are strings. The only except I can see is where the schema asks for an integer, say.

I don't think you need all these `str()` calls, because the schema should take care of making sure that strings are strings. The only except I can see is where the schema asks for an integer, say.
@@ -0,0 +162,4 @@
databases, config, borgmatic_runtime_directory, name=None
): # pragma: no cover
'''
Given a sequence of configurations dicts, a configuration dict, a prefix to log with, the
Owner

Looks like this comment doesn't quite correspond to the given arguments.

Looks like this comment doesn't quite correspond to the given arguments.
@@ -0,0 +175,4 @@
),
dump.make_data_source_dump_filename(
make_dump_path(borgmatic_source_directory), name, hostname='*'
),
Owner

You can safely remove this last one (the glob pattern made from the borgmatic source directory), because there will never be a release of borgmatic using the deprecated source directory that also created archives containing InfluxDB dumps.

You still need the borgmatic runtime directory glob though, because that supports the use of Borg 1.2. (Long story.)

You can safely remove this last one (the glob pattern made from the borgmatic source directory), because there will never be a release of borgmatic using the deprecated source directory that *also* created archives containing InfluxDB dumps. You still need the borgmatic runtime directory glob though, because that supports the use of Borg 1.2. (Long story.)
@@ -0,0 +191,4 @@
'''
Restore a database from the given extract stream. The database is supplied as a data source
configuration dict, but the given hook configuration is ignored. The given configuration dict is
used to construct the destination path, and the given log prefix is used for any log entries. If
Owner

This comment also appears a little out of date.

This comment also appears a little out of date.
@@ -0,0 +193,4 @@
configuration dict, but the given hook configuration is ignored. The given configuration dict is
used to construct the destination path, and the given log prefix is used for any log entries. If
this is a dry run, then don't actually restore anything. Trigger the given active extract
process (an instance of subprocess.Popen) to produce output to consume.
Owner

This might be inaccurate too, given that you're only doing directory format restores?

This might be inaccurate too, given that you're only doing directory format restores?
@@ -0,0 +219,4 @@
# if the restore paths don't exist in the archive.
execute_command_with_processes(
restore_command,
[extract_process] if extract_process else [],
Owner

Given that you're only doing directory format restores, I wouldn't expect extract_process to be used ever (here or two lines below).

Do not feel strongly.

Given that you're only doing directory format restores, I wouldn't expect `extract_process` to be used ever (here or two lines below). Do not feel strongly.
@@ -0,0 +237,4 @@
# Add protocol prefix based on tls setting
protocol = 'https://' if database.get('tls', True) else 'http://'
# Format as protocol://hostname:port
host = f'{protocol}{host}:{port}'
Owner

Same thought here about variable reuse as above.

Same thought here about variable reuse as above.
@@ -0,0 +252,4 @@
http_debug = database.get('http_debug')
full = database.get('full_replace')
influx_command = tuple(
shlex.quote(part) for part in shlex.split(database.get('influx_command') or 'influx')
Owner

Given that you're not using shell=True for invoking this restore command, I'm not sure this quoting will actually work. You can experiment by trying to set an influx_command that contains a space, for instance. (Example: influx_command: influx --skip-verify.)

Given that you're not using `shell=True` for invoking this restore command, I'm not sure this quoting will actually work. You can experiment by trying to set an `influx_command` that contains a space, for instance. (Example: `influx_command: influx --skip-verify`.)
Owner

Actually, I take that back. I think it'll work even without shell=True.

Actually, I take that back. I think it'll work even without `shell=True`.
witten marked this conversation as resolved
witten reviewed 2025-09-24 21:44:39 +00:00
witten left a comment
Owner

Test comments!

Test comments!
@@ -105,0 +109,4 @@
port: 8087
tls: true
skip_verify: true
tojen: mysecrettoken
Owner

Looks like this out of date now.

Looks like this out of date now.
@@ -0,0 +4,4 @@
def test_make_dump_path_creates_correct_path():
assert module.make_dump_path('/tmp') == '/tmp/influxdb_databases'
Owner

As written, this is more of an integration test than a unit test, as it integrates multiple different units. If you want to keep it, I'd recommend moving it into the appropriate location in tests/integration/...

However, given that there's really no logic to speak of in this particular unit, and you've already got a pragma: no cover on it, I think it's safe to delete this test.

As written, this is more of an integration test than a unit test, as it integrates multiple different units. If you want to keep it, I'd recommend moving it into the appropriate location in `tests/integration/...` However, given that there's really no logic to speak of in this particular unit, and you've already got a `pragma: no cover` on it, I think it's safe to delete this test.
@@ -0,0 +8,4 @@
def test_get_default_port_returns_correct_port():
assert module.get_default_port(None, None) == 8086
Owner

You also don't need this test given that there's no logic in the unit and you've got pragma: no cover on it.

Don't feel strongly.

You also don't need this test given that there's no logic in the unit and you've got `pragma: no cover` on it. Don't feel strongly.
@@ -0,0 +13,4 @@
def test_use_streaming_always_returns_false():
assert not module.use_streaming(databases=[], config={})
assert not module.use_streaming(databases=[{'name': 'bucket1'}], config={})
Owner

Similar here: There's no logic in the unit, so IMO you could just slap a pragma: no cover on it and lose the test.

Similar here: There's no logic in the unit, so IMO you could just slap a `pragma: no cover` on it and lose the test.
@@ -0,0 +46,4 @@
'--bucket',
'TestBucket',
'/tmp/dumpfile',
)
Owner

Nice test!

Nice test!
@@ -0,0 +74,4 @@
)
def test_build_dump_command_with_http_disabled():
Owner

Might be a little clearer to call this ..._with_tls_disabled().

Might be a little clearer to call this `..._with_tls_disabled()`.
@@ -0,0 +236,4 @@
'--token',
'testtoken',
'/tmp/dumpfile',
)
Owner

All of these test variants for build_dump_command() are great. The only things I see potentially missing are tests for:

  • port missing (in which case get_default_port() would probably need to be mocked out)
  • influx_command set with a value containing spaces, e.g. with the use of a flag or two (could just be part of the previous test)
  • skip_verify set
  • password set
All of these test variants for `build_dump_command()` are great. The only things I see potentially missing are tests for: * `port` missing (in which case `get_default_port()` would probably need to be mocked out) * `influx_command` set with a value containing spaces, e.g. with the use of a flag or two (could just be part of the previous test) * `skip_verify` set * `password` set
@@ -0,0 +276,4 @@
'mytoken',
'--org-id',
'org123',
),
Owner

Given that, in this test, you're in control of what make_dump_command() returns, it could literally return a flexmock() instance rather than an actual command. For instance, the test would pass even if the command is ('totally', 'not', 'influxdb') (assuming it's in both places).

Do not feel strongly. It's just a way to simplify the test.

Given that, in this test, you're in control of what `make_dump_command()` returns, it could literally return a `flexmock()` instance rather than an actual command. For instance, the test would pass even if the command is `('totally', 'not', 'influxdb')` (assuming it's in both places). Do not feel strongly. It's just a way to simplify the test.
@@ -0,0 +289,4 @@
dry_run=False,
)
assert len(processes) == 0 # No subprocess.Popen instances are returned.
Owner

Nice use of mocking in this test. The one thing missing to my eye is an assert on patterns, as this unit under test definitely modifies it.

It might also be nice to make this test deal with two different databases, just to ensure the looping and appending works as expected. Do not feel strongly about that.

Nice use of mocking in this test. The one thing missing to my eye is an assert on `patterns`, as this unit under test definitely modifies it. It might also be nice to make this test deal with two different databases, just to ensure the looping and appending works as expected. Do not feel strongly about that.
@@ -0,0 +309,4 @@
dry_run=True,
)
assert len(processes) == 0
Owner

You could assert on patterns here too.

You could assert on `patterns` here too.
@@ -0,0 +327,4 @@
'port': '8086', # Changed to string to avoid TypeError
'password': 'mytoken',
}
extract_process = flexmock(stdout=flexmock())
Owner

I'm pretty sure that extract_process is never set for this hook because use_streaming is always False.

I'm pretty sure that `extract_process` is never set for this hook because `use_streaming` is always `False`.
@@ -0,0 +340,4 @@
)
flexmock(module).should_receive('execute_command_with_processes').with_args(
('influx', 'restore', '--host', 'https://localhost:8086', '--token', 'mytoken'),
Owner

Similar feedback here as above. This command could totally just be a flexmock() instance. Do not feel strongly.

Similar feedback here as above. This command could totally just be a `flexmock()` instance. Do not feel strongly.
@@ -0,0 +724,4 @@
'--token',
'mytoken',
'/tmp/dumpfile',
)
Owner

Another good set of test variants! Potentially missing tests though:

  • port missing (in which case get_default_port() would probably need to be mocked out)
  • influx_command set with a value containing spaces, e.g. with the use of a flag or two (could just be part of the previous test)
  • tls set
Another good set of test variants! Potentially missing tests though: * `port` missing (in which case `get_default_port()` would probably need to be mocked out) * `influx_command` set with a value containing spaces, e.g. with the use of a flag or two (could just be part of the previous test) * `tls` set
@@ -0,0 +737,4 @@
config={},
borgmatic_runtime_directory='/tmp',
dry_run=False,
)
Owner

IMO this test could be deleted given that you have a pramga: no cover and zero logic. Do not feel strongly.

IMO this test could be deleted given that you have a `pramga: no cover` and zero logic. Do not feel strongly.
All checks were successful
build / test (pull_request) Successful in 6m51s
build / docs (pull_request) Has been skipped
This pull request has changes conflicting with the target branch.
  • docs/how-to/backup-your-databases.md
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u influxdb_support:jolcese-influxdb_support
git checkout jolcese-influxdb_support
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#1142