"Spot check" of source directories consistency check #656
Labels
No Label
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: borgmatic-collective/borgmatic#656
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Today, borgmatic supports a number of consistency check types, most of them implemented by Borg. They all have various trade-offs around speed and thoroughness, but one thing none of them do is check the contents of the backed up archives against the original source files—which is arguably one important way to ensure your archives contain the files you'll ultimately want to restore in the case of catastrophe (or just an accidentally deleted file). Because if you happen to misconfigure borgmatic such that you're not backing up the files you think you're backing up, every existing consistency check will still pass with flying colors—and you won't discover this problem until you go to restore.
However, automatically and exhaustively checking an archive's contents against the contents of the source directories on disk has two main problems:
So the proposed solution is to run a "spot check" of source directories. The code implementing such a feature would go something like this:
This approach has the benefit of being fast and hopefully not yielding too many false negatives. The main downside is it's probabilistic; it won't catch 100% of source vs. archive consistency problems on any given run. But that might be good enough given the value that it provides over time.
Additionally, to make the tradeoff between false negatives and thoroughness of the check tunable for different source data, there could be borgmatic configuration options for the check:
sample_percentage
: The percentage of total files in the source directories to randomly sample and compare to their corresponding files in the backup archive.tolerance_percentage
: The percentage of total files in the source directories that can fail a sample comparison without failing the entire consistency check:Example:
Open questions
borg diff
won't do that.) Is it easy to do that in a performant-enough way?extract
check does?Implemented in main and will be part of the next release! See the documentation for actual configuration options: https://torsion.org/borgmatic/docs/how-to/deal-with-very-large-backups/#spot-check
Released in 1.8.10!
Great news this is progressing.
I'm currently using #760 and also added a script to check/verify a random set of files, reading up to a set amount of data (2GB in my usage) and a minimum set of files (I've set it to 5 for now).
I figured that if I read a large file, like a video file, it would trip the 2GB threshold and check only 1 file.
I plan to check this new spotcheck feature out today.
Ah, yeah, the spot check feature doesn't currently have a threshold for file sizes, but that would be an interesting thing to add. And I'd love to hear about any feedback you have as you try out the feature!
Hi, I wanted to give it a few tries before giving feedback.
My usage for spot check is for verifying immediately after backup. I ran some backups with the following settings :
count_tolerance_percentage: 0
data_sample_percentage: 1
data_tolerance_percentage: 0
I have the verbose/logging set to 1 (-v 1)
For me, this made sure that all of the files are backed up (file count) and any of the files compared don't differ (file compare). Its pretty much what my current verify scripts do, compare file count, compare a random set of files (mine is set to compare 1.5GB of files).
It caught some borg files that change during the backup which caused the spot check to fail. For me, this was a success as I hadn't seen this before (my other verify scripts caught this too). This change may have occurred after an update with borg. I excluded these and ran the backup again, this time the spot check passed.
When a spot check fails, it would be helpful to know how the number of the file count (backup and repo) also which files fail the compare check. I had a small amount of files fail the check during a backup, it reported 0.00% and couldn't tell where the problem was or how many files were an issue.
The spot check failed one time then retried 5 more times. Is this the same retry setting? Does it need to retry, wouldn't it get the same result each time? All the other checks passed, they were reran each time. Do i need to change settings?
It works on backup and check, if I run check then sometimes it fails on first try (only on one of my repositories) then passes on the second try. I couldn't find what made it fail (like number 2).
I think it would be helpful to know how much data has been verified as well as number of files e.g. Verified : 1% (2000 files, 1.5GB)
If the spot check fails during backup, does the backup fail and gets deleted or something else?
This is working really well.