Home > OS >  Transliteration behavior (tr program in shell)
Transliteration behavior (tr program in shell)

Time:01-19

I understand the basic behavior of tr but am confused when sets and sequences become involved. Some examples:

$ tr 'a' '[0]'
aaaaaa
[[[[[[
$ tr 'a-z' '[0-9]'
abcdef
[01234
$ tr 'a-z' '[*]'
abcde
[*]]]
$ tr 'a-z' '[\n]'
abcdef
[
]]]]
$ tr 'a-z' '[\n*]'
abcdef






$ tr 'a-z' '\n'
abcdef





It seems to me that in some cases (like with the [\n*]) it correctly interprets the set, but in other cases, it seems to stall at the closing bracket and continuously output that. I had thought that sets [] mean "any one of the enclosed characters", and that * would not be treated as a special char within the set. It seems like this holds true in some cases but not others. What is going on here?

Also, it seems to me that tr 'a-z' '\n' performs the same action as tr 'a-z' '[\n*]. Is there some nuance I'm missing?

CodePudding user response:

[ only have special meaning if you are providing the set in the form [CHAR*], [CHAR*REPEAT], [:POSIXCLASS:], or [=CHAR=].

From your examples, only tr 'a-z' '[\n*]' is interpreted as set ("in SET2, copies CHAR until length of SET1"); all other examples are interpreted literally.

So tr 'a-f' '[*]' is identical to writing tr 'abcdef' '[*]' and will translate a→[, b→*, c→], d→], e→], f→].

tr 'a-f' '[\n*]' however is identcal to writing tr 'abcdef' '\n\n\n\n\n\n' and will replace every character with a newline. For a single character, this is already the default with tr, so you could have simply written tr 'a-f' '\n' as you have found out with your last example.

  • Related