I understand the basic behavior of tr but am confused when sets and sequences become involved. Some examples:
$ tr 'a' '[0]'
aaaaaa
[[[[[[
$ tr 'a-z' '[0-9]'
abcdef
[01234
$ tr 'a-z' '[*]'
abcde
[*]]]
$ tr 'a-z' '[\n]'
abcdef
[
]]]]
$ tr 'a-z' '[\n*]'
abcdef
$ tr 'a-z' '\n'
abcdef
It seems to me that in some cases (like with the [\n*]
) it correctly interprets the set, but in other cases, it seems to stall at the closing bracket and continuously output that. I had thought that sets []
mean "any one of the enclosed characters", and that *
would not be treated as a special char within the set. It seems like this holds true in some cases but not others. What is going on here?
Also, it seems to me that tr 'a-z' '\n'
performs the same action as tr 'a-z' '[\n*]
. Is there some nuance I'm missing?
CodePudding user response:
[
only have special meaning if you are providing the set in the form [CHAR*]
, [CHAR*REPEAT]
, [:POSIXCLASS:]
, or [=CHAR=]
.
From your examples, only tr 'a-z' '[\n*]'
is interpreted as set ("in SET2, copies CHAR until length of SET1"); all other examples are interpreted literally.
So tr 'a-f' '[*]'
is identical to writing tr 'abcdef' '[*]'
and will translate a→[, b→*, c→], d→], e→], f→].
tr 'a-f' '[\n*]'
however is identcal to writing tr 'abcdef' '\n\n\n\n\n\n'
and will replace every character with a newline. For a single character, this is already the default with tr
, so you could have simply written tr 'a-f' '\n'
as you have found out with your last example.