Read and write data source dump metadata files within an archive. #1136
Reference in New Issue
Block a user
Delete Branch "data-source-dump-metadata"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Motivation
This came out of discussion in #1121. The summary is that trying to parse dump information out of each dump's path (as borgmatic does today) is maybe okay when you've only got hostname, port, and database name—but it gets unwieldy pretty fast when you start introducing additional data points that are needed when you go to
restore.What this PR does
create, write a file with metadata for all database dumps. This includes things like hook name, database name, hostname, and port. It could be pretty easily expanded to additional fields like label or container name.restore, probe for and read the metadata file describing all the dumps in the archive. If available, use it instead of parsing dump information out of each dump's path. If not available, fall back to the old method so archives made with older versions of borgmatic are still supported.Note that this doesn't write one metadata file per dump, as was originally suggested. It turns out that was a pain to actually extract from the archive upon
restore. Instead, there's one metadata file per supported data source hook.Example metadata contents
Feedback requested
Is this worth doing? Is there a better way to do this? Shall I finish this PR?
Still to do
create/list/restorecreate/list/restoreallallandformat: sqlcreate/list/restoreallallandformat: sqlcreate/list/restorecreate/list/restorecreate/list/restoreInitial partial review.
I am torn, but if we want to do the container PR I guess we don't have many options :)
Don't think so, the single file per datasource is a nice touch.
👍
@@ -36,0 +47,4 @@with open(os.path.join(borgmatic_runtime_directory, hook_name, 'dumps.json'), 'w') as metadata_file:json.dump([dump._asdict() for dump in dumps_metadata], metadata_file)Since the file ends up as part of the dump it would be great if it were stable (ie doesn't randomly change). As such I'd recommend settings
sort_keys=TrueGood idea!
@@ -246,6 +256,11 @@ def dump_data_sources(),)if not dry_run:Not exactly sure from the webui, but it looks like this could get pulled into the outer level and merged with the following
if not dry_runsince it should only get written once for all databases?Good catch!
Just a heads up, I am leaving for a sailing week, so I won't be able to give you another round of reviews for a week or two. If you want to merge before that, don't let me stop you…
I appreciate the heads up. I'll probably just leave this open till you get back and have bandwidth to look at it, given that you're the primary consumer of this change. (But if it turns out you don't have time on the other end, just let me know.) Enjoy your trip!
WIP: Read and write data source dump metadata files within an archive.to Read and write data source dump metadata files within an archive.Ready for review when you're available!
@@ -36,0 +49,4 @@try:with open(dumps_metadata_path, 'w', encoding='utf-8') as metadata_file:json.dump([dump._asdict() for dump in dumps_metadata], metadata_file, sort_keys=True)manifest.json, an unrelated borgmatic metadata JSON file, has its data structured as a dict andborgmatic_versionas a key. I guess so as to make future backwards compatibility easier.Should I do that here as well?
If you value stable data then yes. But if you change that now it would mean a diff once for every repo (assuming the data isn't sorted currently).
I'm not sure I follow. Here's an example of the current file format as represented in this PR:
And here's an example of what I'm proposing it could change to:
So what do you mean by a diff for every repo? Are you just saying that when there's a borgmatic version bump, each metadata file would have to change in the next archive?
Oh sorry, I am still somewhat tired. I read your initial comment as asking whether you should apply
sort_keystomanifest.jsonas well. I now see what you mean… I feel like addingborgmatic_versiondoesn't add much value currently. In the worst case you could read it from the existingmanifest.json. Also you can add it in the future as well, the first time you changedumps.jsonin an incompatible way you can always add the version then and use the absence of the key as "old" version indicator.Okay, makes sense. Thanks for talking me down from over-engineering. 😄 I might still do this though, with a dict + key just because it seems a little nicer:
Yeah, that makes it certainly more self-explanatory.
@@ -36,0 +63,4 @@the expected keys.'''try:return tuple(borgmatic.actions.restore.Dump(**dump) for dump in json.loads(dumps_json))I think
borgmatic.actions.restore.Dumpshould probably get moved toborgmatic.hooks.data_source.dump.Dumpor something since it's no longer strictly restore-related. But I didn't do that in this PR since that would make merging into your PR even more of a headache.I'd be fine, I'd rebase anyways.
I'm leaving it as-is for now, as I'm not convinced that
borgmatic.hooks.data_source.dump.Dumpis a better place for it to live and I don't want to move it just for the sake of moving it. I may revisit in the future.LGTM, once you are done with all the code changes I'd rebase my branch on that to see if I run into new issues :D
Released in borgmatic 2.0.8!