Two users report issues (here, here) with pre-commit.d/30store-metadata
resp. commit.d/20store-metadata
:
In a German locale, etckeeper commit 'daily autocommit'
creates messages like this:
grep: (Standardeingabe): Übereinstimmungen in Binärdatei
coming from the grep
in line 25. It can be fixed by either setting
LC_CTYPE=C
(maybe generally, in that file), or by adding -a
to the grep
call.
Using grep -a would be fine, except people do use etckeeper on non-gnu systems and I worry about breaking portability.
I guess the idea with setting
LC_CTYPE=C
is not to force this error message to English, which would not be useful, but because whatever the locale is set to is causing grep to interpret the input as binary.Except.. the C locale seems equally likely cause that as whatever locale they are using? Indeed the one user who mentioned their locale was using
de_DE.utf8
so why would setting C help at all?grep is complaining about its stdin, which comes from running a find in /etc, so there must be some particular file whose name causes grep to behave this way. Apparently one that, in a unicode locale, grep thinks is not unicode, but binary.
(Also, there was [[!commit 0dd5ff64bf4dba9a2e54c7f29c96998af5dcebce]] which also involved setting LANG=C when using grep, in similarly hard to understand circumstances.)
Exactly.
Because, according to the
grep
's man page: "In the C or POSIX locale, all characters are encoded as a single byte and every byte is a valid character."While I don't know which files caused the issue for the reporters of the two bugs, I was able to reproduce the issue by running
touch $'\xf6'
in/etc
, creating a file named 'ö' in a latin-1 locale, but with an invalid name if interpreted as utf-8. (This can also be seen runningfind /etc | iconv -f utf8 - -o /dev/null
which yieldsiconv: illegal input sequence at position XY
.)Thanks, I understand.
I've applied the
LC_CTYPE
fix.