Just to round this off, the issue was down to the RAM I was using, even though it passed memtest. So something I learned there too.
For anyone interested I managed to make the error repeatable…
Seems like it was a pretty obscure hardware issue - only caught by the scheduled checks by borgmatic. So there's value there!
Using borg
directly for the check:
Remote: Starting repository check
Remote: Data integrity error: Segment entry checksum mismatch [segment 122, offset 182376752]
Remote: Data…
Side question (maybe) - when trying to run:
$ sudo borgmatic check -c /config.yaml --repository ssh://borg@remotebackup/backup/borg --verbosity 2
The command fails because…
Running it again:
summary:
config.yaml: An error occurred
remotebackup: Error running actions for repository
Data integrity error: Segment entry checksum mismatch [segment 5, offset…
Got a log for a failed backup! it seems to occur while pruning:
config.yaml: Pinging Healthchecks start
localbackup: Creating archive
Creating archive at "/backup/borg::nas-2024-10-13T11:…
#899 was mine! I've upgraded now.
My suspicion is that something is happening in the daily backups (all of which end in a logged success), which only becomes an issue during the less frequent…
So, smartctl extended tests came back OK, and I can't see anything obvious in the syslog.
I tried the following:
$ sudo borgmatic check --force -c /config.yaml --verbosity 2 --only…
Fwiw we do a smart scan periodically and disks have been healthy for a while. I'll look into the deeper options and will check the logs.
I went ahead with the repair and all is running smoothly now. This happens more than I would like, and I suspect I only noticed when the spot checks are performed. Which leaves me with two…
FWIW the third scheduled backup did "stack up". Killing them cleaned everything up (including the processes on the server).
I then ran the scheduled backup manually (no force, no spot):
…
If it was a large file, shouldn't we expect some CPU usage?
Presumably the logging would be for /future/ checks? If so how do I safely abort this run? Just kill the borg processes?
And then…
Running create
, prune
, compact
, check
separately resulted in a failed check
, but then retrying that succeeded. So this seems transient.
If a check
fails, should that take out the…