"Spot check" of source directories consistency check #656
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Today, borgmatic supports a number of consistency check types, most of them implemented by Borg. They all have various trade-offs around speed and thoroughness, but one thing none of them do is check the contents of the backed up archives against the original source files—which is arguably one important way to ensure your archives contain the files you'll ultimately want to restore in the case of catastrophe (or just an accidentally deleted file). Because if you happen to misconfigure borgmatic such that you're not backing up the files you think you're backing up, every existing consistency check will still pass with flying colors—and you won't discover this problem until you go to restore.
However, automatically and exhaustively checking an archive's contents against the contents of the source directories on disk has two main problems:
So the proposed solution is to run a "spot check" of source directories. The code implementing such a feature would go something like this:
This approach has the benefit of being fast and hopefully not yielding too many false negatives. The main downside is it's probabilistic; it won't catch 100% of source vs. archive consistency problems on any given run. But that might be good enough given the value that it provides over time.
Additionally, to make the tradeoff between false negatives and thoroughness of the check tunable for different source data, there could be borgmatic configuration options for the check:
sample_percentage
: The percentage of total files in the source directories to randomly sample and compare to their corresponding files in the backup archive.tolerance_percentage
: The percentage of total files in the source directories that can fail a sample comparison without failing the entire consistency check:Example:
Open questions
borg diff
won't do that.) Is it easy to do that in a performant-enough way?extract
check does?Implemented in main and will be part of the next release! See the documentation for actual configuration options: https://torsion.org/borgmatic/docs/how-to/deal-with-very-large-backups/#spot-check
Released in 1.8.10!
Great news this is progressing.
I'm currently using #760 and also added a script to check/verify a random set of files, reading up to a set amount of data (2GB in my usage) and a minimum set of files (I've set it to 5 for now).
I figured that if I read a large file, like a video file, it would trip the 2GB threshold and check only 1 file.
I plan to check this new spotcheck feature out today.
Ah, yeah, the spot check feature doesn't currently have a threshold for file sizes, but that would be an interesting thing to add. And I'd love to hear about any feedback you have as you try out the feature!
Hi, I wanted to give it a few tries before giving feedback.
My usage for spot check is for verifying immediately after backup. I ran some backups with the following settings :
count_tolerance_percentage: 0
data_sample_percentage: 1
data_tolerance_percentage: 0
I have the verbose/logging set to 1 (-v 1)
For me, this made sure that all of the files are backed up (file count) and any of the files compared don't differ (file compare). Its pretty much what my current verify scripts do, compare file count, compare a random set of files (mine is set to compare 1.5GB of files).
It caught some borg files that change during the backup which caused the spot check to fail. For me, this was a success as I hadn't seen this before (my other verify scripts caught this too). This change may have occurred after an update with borg. I excluded these and ran the backup again, this time the spot check passed.
When a spot check fails, it would be helpful to know how the number of the file count (backup and repo) also which files fail the compare check. I had a small amount of files fail the check during a backup, it reported 0.00% and couldn't tell where the problem was or how many files were an issue.
The spot check failed one time then retried 5 more times. Is this the same retry setting? Does it need to retry, wouldn't it get the same result each time? All the other checks passed, they were reran each time. Do i need to change settings?
It works on backup and check, if I run check then sometimes it fails on first try (only on one of my repositories) then passes on the second try. I couldn't find what made it fail (like number 2).
I think it would be helpful to know how much data has been verified as well as number of files e.g. Verified : 1% (2000 files, 1.5GB)
If the spot check fails during backup, does the backup fail and gets deleted or something else?
This is working really well.
Hi @witten
Just wondering, did you see my last feedback and questions above?
Would it be better to put them into a new ticket next time rather than add to this one?
Thank you for the nudge, and my apologies for the delay in getting back to this! I do really appreciate all your testing and feedback on this feature.. It's invaluable on something new like this.
Some initial responses:
--verbosity 2
) log entries? Say, one per failing file and maybe another for the counts? The existing error message is already pretty crowded.create
andcheck
are run independently.Thanks for your reply. Sorry its taken me time, I wanted to do some more tests first and got side tracked.
Having thought about it, I think listing the failing files could be overload of output. For my usage, it maybe a couple of files. Of the usage you describe, it could be hundred or even thousands of files. That's a lot.
I'll go through my suggested output again and see where they could possibly fit (verbosity 1 or 2, etc). I tend to run in verbosity 1, I'll run in different levels and try it out.
I will add any further feedback to a new ticket.
Rob