Read and write data source dump metadata files within an archive. #1136

Merged
witten merged 18 commits from data-source-dump-metadata into main 2025-09-14 22:38:20 +00:00
Owner

Motivation

This came out of discussion in #1121. The summary is that trying to parse dump information out of each dump's path (as borgmatic does today) is maybe okay when you've only got hostname, port, and database name—but it gets unwieldy pretty fast when you start introducing additional data points that are needed when you go to restore.

What this PR does

  • Upon create, write a file with metadata for all database dumps. This includes things like hook name, database name, hostname, and port. It could be pretty easily expanded to additional fields like label or container name.
  • Upon restore, probe for and read the metadata file describing all the dumps in the archive. If available, use it instead of parsing dump information out of each dump's path. If not available, fall back to the old method so archives made with older versions of borgmatic are still supported.

Note that this doesn't write one metadata file per dump, as was originally suggested. It turns out that was a pain to actually extract from the archive upon restore. Instead, there's one metadata file per supported data source hook.

Example metadata contents

$ borgmatic --config test.yaml -v 2 mount --archive latest --mount-point tmp
$ cat tmp/borgmatic/postgresql_databases/dumps.json
{"dumps": [{"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null}]}

Feedback requested

Is this worth doing? Is there a better way to do this? Shall I finish this PR?

Still to do

  • general cleanup
  • error handling (see TODOs)
  • add to all other relevant database hooks
  • get existing automated tests passing
  • add new automated test coverage
  • manual testing
    • PostgreSQL create/list/restore
      • basic
      • fallback to path-based dump metadata
    • MariaDB create/list/restore
      • basic
      • all
      • all and format: sql
      • fallback to path-based dump metadata
    • MySQL create/list/restore
      • basic
      • all
      • all and format: sql
      • fallback to path-based dump metadata
    • MariaDB + MySQL create/list/restore
      • multiple (one MariaDB database and one MySQL database)
    • MongoDB create/list/restore
      • basic
      • fallback to path-based dump metadata
    • SQLite create/list/restore
      • basic
      • fallback to path-based dump metadata
## Motivation This came out of discussion in #1121. The summary is that trying to parse dump information out of each dump's path (as borgmatic does today) is maybe okay when you've only got hostname, port, and database name—but it gets unwieldy pretty fast when you start introducing additional data points that are needed when you go to `restore`. ## What this PR does * Upon `create`, write a file with metadata for all database dumps. This includes things like hook name, database name, hostname, and port. It could be pretty easily expanded to additional fields like label or container name. * Upon `restore`, probe for and read the metadata file describing all the dumps in the archive. If available, use it instead of parsing dump information out of each dump's path. If not available, fall back to the old method so archives made with older versions of borgmatic are still supported. Note that this doesn't write one metadata file per dump, as was originally suggested. It turns out that was a pain to actually extract from the archive upon `restore`. Instead, there's one metadata file per supported data source hook. ## Example metadata contents ```bash $ borgmatic --config test.yaml -v 2 mount --archive latest --mount-point tmp $ cat tmp/borgmatic/postgresql_databases/dumps.json ``` ```json {"dumps": [{"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null}]} ``` ## Feedback requested Is this worth doing? Is there a better way to do this? Shall I finish this PR? ## Still to do * [x] general cleanup * [x] error handling (see TODOs) * [x] add to all other relevant database hooks * [x] get existing automated tests passing * [x] add new automated test coverage * [x] manual testing * [x] PostgreSQL `create`/`list`/`restore` * [x] basic * [x] fallback to path-based dump metadata * [x] MariaDB `create`/`list`/`restore` * [x] basic * [x] `all` * [x] `all` and `format: sql` * [x] fallback to path-based dump metadata * [x] MySQL `create`/`list`/`restore` * [x] basic * [x] `all` * [x] `all` and `format: sql` * [x] fallback to path-based dump metadata * [x] MariaDB + MySQL `create`/`list`/`restore` * [x] multiple (one MariaDB database and one MySQL database) * [x] MongoDB `create`/`list`/`restore` * [x] basic * [x] fallback to path-based dump metadata * [x] SQLite `create`/`list`/`restore` * [x] basic * [x] fallback to path-based dump metadata
witten added 1 commit 2025-08-27 23:06:07 +00:00
Read and write data source dump metadata files within an archive.
Some checks failed
build / test (pull_request) Failing after 6m50s
build / docs (pull_request) Has been skipped
030abfa13c
apollo13 reviewed 2025-08-28 06:13:34 +00:00
apollo13 left a comment
Contributor

Initial partial review.

Is this worth doing?

I am torn, but if we want to do the container PR I guess we don't have many options :)

Is there a better way to do this?

Don't think so, the single file per datasource is a nice touch.

Shall I finish this PR?

👍

Initial partial review. > Is this worth doing? I am torn, but if we want to do the container PR I guess we don't have many options :) > Is there a better way to do this? Don't think so, the single file per datasource is a nice touch. > Shall I finish this PR? :+1:
@@ -36,0 +47,4 @@
with open(
os.path.join(borgmatic_runtime_directory, hook_name, 'dumps.json'), 'w'
) as metadata_file:
json.dump([dump._asdict() for dump in dumps_metadata], metadata_file)
Contributor

Since the file ends up as part of the dump it would be great if it were stable (ie doesn't randomly change). As such I'd recommend settings sort_keys=True

Since the file ends up as part of the dump it would be great if it were stable (ie doesn't randomly change). As such I'd recommend settings `sort_keys=True`
Author
Owner

Good idea!

Good idea!
witten marked this conversation as resolved
@@ -246,6 +256,11 @@ def dump_data_sources(
),
)
if not dry_run:
Contributor

Not exactly sure from the webui, but it looks like this could get pulled into the outer level and merged with the following if not dry_run since it should only get written once for all databases?

Not exactly sure from the webui, but it looks like this could get pulled into the outer level and merged with the following `if not dry_run` since it should only get written once for all databases?
Author
Owner

Good catch!

Good catch!
witten marked this conversation as resolved
witten added 1 commit 2025-09-03 17:21:04 +00:00
PR feedback.
Some checks failed
build / test (pull_request) Failing after 7m2s
build / docs (pull_request) Has been skipped
eb971694bb
witten added 2 commits 2025-09-03 17:21:25 +00:00
Merge branch 'main' into data-source-dump-metadata
Some checks failed
build / test (pull_request) Failing after 1m51s
build / docs (pull_request) Has been skipped
09d702ee88
witten added 1 commit 2025-09-03 21:49:17 +00:00
Error handling.
Some checks failed
build / test (pull_request) Failing after 1m50s
build / docs (pull_request) Has been skipped
b45c9628e7
witten added 1 commit 2025-09-03 21:53:05 +00:00
Add error handling docstrings.
Some checks failed
build / test (pull_request) Failing after 1m50s
build / docs (pull_request) Has been skipped
e02c4be493
witten added 1 commit 2025-09-03 23:53:40 +00:00
Fix broken fallback to dump path introspection when dumps.json is missing. Also add additional logging.
Some checks failed
build / test (pull_request) Failing after 1m49s
build / docs (pull_request) Has been skipped
fca647b3ae
witten added 1 commit 2025-09-04 01:55:08 +00:00
Add MariaDB dumps metadata.
Some checks failed
build / test (pull_request) Failing after 1m52s
build / docs (pull_request) Has been skipped
049132377c
witten added 1 commit 2025-09-04 02:22:08 +00:00
Add MySQL dump metadata.
Some checks failed
build / test (pull_request) Failing after 1m53s
build / docs (pull_request) Has been skipped
d40f728410
witten added 1 commit 2025-09-04 02:32:03 +00:00
Add MongoDB dumps metadata.
Some checks failed
build / test (pull_request) Failing after 1m52s
build / docs (pull_request) Has been skipped
03e3e0f5dd
witten added 1 commit 2025-09-04 02:35:24 +00:00
Add NEWS entry for dumps metadata.
Some checks failed
build / test (pull_request) Failing after 1m53s
build / docs (pull_request) Has been skipped
5a45a0cbc0
witten added 1 commit 2025-09-04 03:20:39 +00:00
Add SQLite dump metadata.
Some checks failed
build / test (pull_request) Failing after 1m53s
build / docs (pull_request) Has been skipped
6a1f2a4c86
witten added 1 commit 2025-09-04 17:36:59 +00:00
Fix tests for database hooks.
Some checks failed
build / test (pull_request) Failing after 1m49s
build / docs (pull_request) Has been skipped
9bbce71673
Contributor

Just a heads up, I am leaving for a sailing week, so I won't be able to give you another round of reviews for a week or two. If you want to merge before that, don't let me stop you…

Just a heads up, I am leaving for a sailing week, so I won't be able to give you another round of reviews for a week or two. If you want to merge before that, don't let me stop you…
Author
Owner

I appreciate the heads up. I'll probably just leave this open till you get back and have bandwidth to look at it, given that you're the primary consumer of this change. (But if it turns out you don't have time on the other end, just let me know.) Enjoy your trip!

I appreciate the heads up. I'll probably just leave this open till you get back and have bandwidth to look at it, given that you're the primary consumer of this change. (But if it turns out you don't have time on the other end, just let me know.) Enjoy your trip!
witten added 1 commit 2025-09-05 21:22:28 +00:00
Get existing restore tests passing (and add one test for the dumps metadata present case).
Some checks failed
build / test (pull_request) Failing after 1m54s
build / docs (pull_request) Has been skipped
d7205694db
witten added 2 commits 2025-09-05 23:02:57 +00:00
Add tests for new code.
All checks were successful
build / test (pull_request) Successful in 6m45s
build / docs (pull_request) Has been skipped
627898cc23
witten changed title from WIP: Read and write data source dump metadata files within an archive. to Read and write data source dump metadata files within an archive. 2025-09-05 23:07:42 +00:00
Author
Owner

Ready for review when you're available!

Ready for review when you're available!
witten reviewed 2025-09-06 05:27:07 +00:00
@@ -36,0 +49,4 @@
try:
with open(dumps_metadata_path, 'w', encoding='utf-8') as metadata_file:
json.dump([dump._asdict() for dump in dumps_metadata], metadata_file, sort_keys=True)
Author
Owner

manifest.json, an unrelated borgmatic metadata JSON file, has its data structured as a dict and borgmatic_version as a key. I guess so as to make future backwards compatibility easier.

Should I do that here as well?

`manifest.json`, an unrelated borgmatic metadata JSON file, has its data structured as a dict and `borgmatic_version` as a key. I guess so as to make future backwards compatibility easier. Should I do that here as well?
Contributor

If you value stable data then yes. But if you change that now it would mean a diff once for every repo (assuming the data isn't sorted currently).

If you value stable data then yes. But if you change that now it would mean a diff once for every repo (assuming the data isn't sorted currently).
Author
Owner

I'm not sure I follow. Here's an example of the current file format as represented in this PR:

[{"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null}]

And here's an example of what I'm proposing it could change to:

{
    "borgmatic_version": "2.0.8.dev0",
    "dumps": [
        {"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null}
    ]
}

So what do you mean by a diff for every repo? Are you just saying that when there's a borgmatic version bump, each metadata file would have to change in the next archive?

I'm not sure I follow. Here's an example of the current file format as represented in this PR: ```json [{"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null}] ``` And here's an example of what I'm proposing it could change to: ```json { "borgmatic_version": "2.0.8.dev0", "dumps": [ {"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null} ] } ``` So what do you mean by a diff for every repo? Are you just saying that when there's a borgmatic version bump, each metadata file would have to change in the next archive?
Contributor

Oh sorry, I am still somewhat tired. I read your initial comment as asking whether you should apply sort_keys to manifest.json as well. I now see what you mean… I feel like adding borgmatic_version doesn't add much value currently. In the worst case you could read it from the existing manifest.json. Also you can add it in the future as well, the first time you change dumps.json in an incompatible way you can always add the version then and use the absence of the key as "old" version indicator.

Oh sorry, I am still somewhat tired. I read your initial comment as asking whether you should apply `sort_keys` to `manifest.json` as well. I now see what you mean… I feel like adding `borgmatic_version` doesn't add much value currently. In the worst case you could read it from the existing `manifest.json`. Also you can add it in the future as well, the first time you change `dumps.json` in an incompatible way you can always add the version then and use the absence of the key as "old" version indicator.
Author
Owner

Okay, makes sense. Thanks for talking me down from over-engineering. 😄 I might still do this though, with a dict + key just because it seems a little nicer:

{
    "dumps": [
        {"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null}
    ]
}
Okay, makes sense. Thanks for talking me down from over-engineering. 😄 I might still do this though, with a dict + key just because it seems a little nicer: ```json { "dumps": [ {"hook_name": "postgresql_databases", "data_source_name": "test", "hostname": "localhost", "port": null} ] } ```
Contributor

Yeah, that makes it certainly more self-explanatory.

Yeah, that makes it certainly more self-explanatory.
witten marked this conversation as resolved
witten reviewed 2025-09-06 05:30:27 +00:00
@@ -36,0 +63,4 @@
the expected keys.
'''
try:
return tuple(borgmatic.actions.restore.Dump(**dump) for dump in json.loads(dumps_json))
Author
Owner

I think borgmatic.actions.restore.Dump should probably get moved to borgmatic.hooks.data_source.dump.Dump or something since it's no longer strictly restore-related. But I didn't do that in this PR since that would make merging into your PR even more of a headache.

I think `borgmatic.actions.restore.Dump` should probably get moved to `borgmatic.hooks.data_source.dump.Dump` or something since it's no longer strictly restore-related. But I didn't do that in this PR since that would make merging into your PR even more of a headache.
Contributor

I'd be fine, I'd rebase anyways.

I'd be fine, I'd rebase anyways.
Author
Owner

I'm leaving it as-is for now, as I'm not convinced that borgmatic.hooks.data_source.dump.Dump is a better place for it to live and I don't want to move it just for the sake of moving it. I may revisit in the future.

I'm leaving it as-is for now, as I'm not convinced that `borgmatic.hooks.data_source.dump.Dump` is a better place for it to live and I don't want to move it just for the sake of moving it. I may revisit in the future.
witten marked this conversation as resolved
witten added 1 commit 2025-09-09 16:24:22 +00:00
Merge branch 'main' into data-source-dump-metadata
All checks were successful
build / test (pull_request) Successful in 6m44s
build / docs (pull_request) Has been skipped
b340ede9ac
witten added 1 commit 2025-09-13 05:41:51 +00:00
Merge branch 'main' into data-source-dump-metadata
All checks were successful
build / test (pull_request) Successful in 7m2s
build / docs (pull_request) Has been skipped
30b8aeb764
apollo13 approved these changes 2025-09-14 13:18:17 +00:00
apollo13 left a comment
Contributor

LGTM, once you are done with all the code changes I'd rebase my branch on that to see if I run into new issues :D

LGTM, once you are done with all the code changes I'd rebase my branch on that to see if I run into new issues :D
witten added 1 commit 2025-09-14 22:32:44 +00:00
PR "feedback": Adding a key to the dumps metadata JSON for clarity and expandability (#1136).
All checks were successful
build / test (pull_request) Successful in 11m48s
build / docs (pull_request) Has been skipped
084fbd2f16
witten merged commit 000d633590 into main 2025-09-14 22:38:20 +00:00
witten deleted branch data-source-dump-metadata 2025-09-14 22:39:15 +00:00
Author
Owner

Released in borgmatic 2.0.8!

Released in borgmatic 2.0.8!
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: borgmatic-collective/borgmatic#1136