Ignore illegal byte sequences

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the unix category.

Last Updated: 2024-04-23

I had this code to generate a random string

tr -dc "[:alnum:]" </dev/urandom | head -c 32

tr: Illegal byte sequence

The error happened because /dev/urandom generated a sequence of bytes that could not be represented as UTF-8, which tr expected because of the global variables in my env:

LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

What we want to do, just for the duration of the command, is tell the program to not try to convert the sequence of bytes to strings - i.e. treat it as a string of bytes. We do this usually by setting LC_CTYPE=C (normally) However there is another relevant variable: LC_ALL. It has higher precedence than LC_CTYPE and because my terminal profile sets LC_ALL (to some UTF variant) I had to set LC_ALL for it to work.

The working command:

LC_ALL=C tr -dc "[:alnum:]" </dev/urandom | head -c 32
fQLKSWK4kPUmEAzPnHMI0JdUm0sXsR6w%

Briefly it:

e.g.

LC_ALL=C tr -dc "[:alnum:]" </dev/urandom | head -c 32
fQLKSWK4kPUmEAzPnHMI0JdUm0sXsR6w%                                                                                                                                                  ~/code/code-diary   master
LC_ALL=C tr -d "[:alnum:]" </dev/urandom | head -c 32
�"���%�����
�������Ŕ�˒�%