Borgmatic crashing with a "'utf-8' codec can't decode byte 0xc3 in position 4095: unexpected end of data" #1258
Labels
No labels
blocked
breaking
bug
data loss
design finalized
good first issue
new feature area
question / support
security
waiting for response
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
borgmatic-collective/borgmatic#1258
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What I'm trying to do and why
Up until this morning, I had no issue with the same config.
Basically backing up some postgres server and then some of my personal files.
Steps to reproduce
Backup a 17 postgres database of Immich. It fails now.
Actual behavior
Expected behavior
No response
Other notes / implementation ideas
No response
borgmatic version
2.1.1
borgmatic installation method
Docker image
Borg version
borg 1.4.3
Python version
Python 3.14.2
Database version (if applicable)
psql (PostgreSQL) 17.7
Operating system and version
NAME="Alpine Linux" ID=alpine VERSION_ID=3.22.3 PRETTY_NAME="Alpine Linux v3.22" HOME_URL="https://alpinelinux.org/" BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
Thanks for filing this. Some thoughts:
localecommand might show you.source_directoriesorpatternsto greatly limit the source paths (e.g. just backup/bootonly or whatever), does the problem still occur?My guess is what's happening is that when borgmatic is doing a Borg dry run to figure out what files it plans to backup, it's encountering a filename that's not in an encoding it expects.
Same errors here since Borgmatic 2.1.2, it does not appear at every run, and not for every of my database (only 2 of them) :
My config :
I'm using :
Borgmatic 2.1.1
Borg 1.4.3
Python 3.13.5
Debian 13 - using en_US.UTF-8
Docker version 29.2.0, build 0b9d198
My database in my compose if you need
@maxhamon Thanks for weighing in and including this information about your setup. So it sounds like this started happening in 2.1.2 and didn't occur in 2.1.1? If so, that definitely helps narrow it down. And your system locale is UTF-8 rather than something more exotic, so the locale probably isn't causing this.
What about these other two possibilities?
source_directoriesto include just a subset of your files (e.g. just backup/srv/redacted/one_small_sub_directoryonly instead of all of/srv/redacted), does the problem still occur?Thank you!
Hi, I am also getting the error in 2.1.1 but not in 2.1.0. Once in a configuration to dump a PostgreSQL database and another one in an SQLite database but not all PostgreSQL or SQLite database are affected. I am using the ghcr.io docker images and pinned it to 2.1.0 at the moment.
Hi @slarti, that's helpful to know this started for you in 2.1.1. A few questions:
Checking file paths Borg plans to include, the logged Borg dry run command, and all the variousremove_data_source_dumpslogs?source_directoriesorpatternsto greatly limit the source paths (e.g. just backup one small directory), does the problem still occur?I have a theory about what could be going wrong here: Starting in borgmatic 2.1.1, borgmatic reads output from executed commands (like Borg, PostgreSQL, and SQLite) in up to 4096-byte chunks. It unicode-decodes that data (as UTF-8) and breaks it into lines for consumption elsewhere in borgmatic. This change was made in #1242. The problem, I'm guessing, is that if a multi-byte unicode character happens to straddle that 4096-byte boundary, borgmatic will attempt to decode just the first byte of that character—and fail with the error you're seeing. Note the position 4095 (4096 - 1) mentioned in the error.
In terms of a fix, the right thing to do is probably to hold off any unicode decoding until a full line is received. That way, we won't be at risk of accidentally splitting a multi-byte unicode character and trying to decode just one part.
Note that this is just a theory at this point, and I don't have a local repro. Getting answers to some of the questions above would help me pin it down.
EDIT: I've managed to write a failing unit test that produces the same error y'all are getting. Now the trick will just be fixing the code to make that test pass.
Okay, I believe I have a fix for this in main, and it will be part of the next release. If anyone would like to test it, you can download execute.py with the fix and use it to replace your local copy (located at
borgmatic/execute.pyin your borgmatic source directory, wherever that is on your system) and report back! Otherwise, you can wait for the next release. Thanks.I haven’t had any errors since patching
execute.pya few hours ago, so it looks like this fixes the issue for me.Awesome, I appreciate you testing it out and reporting back! Glad to hear it's working for you.
Thanks for your effort. I ran into the same problem and patching
execute.pywith your suggested version also worked for me :-)Released in borgmatic 2.1.2!