Home > OS >  Why do sed a and sed s commands behave differently with respect to escape characters under single qu
Why do sed a and sed s commands behave differently with respect to escape characters under single qu

Time:07-11

I know there are differences between single quotes and double quotes in a sed expression, but I didn't know there are differences between sed a and sed s expressions.

For sed s expressions, \t is translated as a tab correctly both in single and double quotes. \\t also does the same thing in double quotes.

# '\t' works for single quotes
$ echo -e "abc\n123" | sed 's|abc|&\n\tdef|'
abc
    def
123

# '\\t' fails for single quotes
$ echo -e "abc\n123" | sed 's|abc|&\\n\\tdef|'
abc\n\tdef
123

# '\t' works for double quotes
$ echo -e "abc\n123" | sed "s|abc|&\n\tdef|"
abc
    def
123

# '\\t' also works for double quotes
$ echo -e "abc\n123" | sed "s|abc|&\\n\\tdef|"
abc
    def
123

However, in sed a expressions, I have to use \\t in a single quotes expression and \\\t in one with double quotes.

# '\t' fails for single quotes
$ echo -e "abc\n123" | sed '/abc/a\tdef'
abc
tdef
123

# '\\t' works for single quotes
$ echo -e "abc\n123" | sed '/abc/a\\tdef'
abc
    def
123

# '\t' fails for double quotes
$ echo -e "abc\n123" | sed "/abc/a\tdef"
abc
tdef
123

# '\\t' fails for double quotes
$ echo -e "abc\n123" | sed "/abc/a\\tdef"
abc
tdef
123

# '\\\t' works for double quotes
$ echo -e "abc\n123" | sed "/abc/a\\\tdef"
abc
    def
123

I had to change my sed a expressions to sed s ones to unify the outputs due to this phenomenon. Things work perfectly, but I'd like an explanation.

The commands above are executed on Ubuntu 20.04.

CodePudding user response:

sed has no idea which quotes you use. The shell parses and removes the quotes. Inside single quotes, text is preserved completely verbatim; inside double quotes, the shell performs variable substitution, command substitution, and backslash processing. The rules are simple, but sometimes surprising: in brief, a backslash quotes the next character as literal, and so, a pair of backslashes gets translated to a single backslash. However, backslashes in front of characters which do not need escaping are preserved. So for example, \t is equivalent to \\t inside double quotes.

sed for its part performs another round of backslash processing. In some contexts, some versions of sed understand \t to represent a literal tab character, but generally not in the text after the a, c, or i commands.

The actual question here is probably actually about the formatting of the a command. This differs between sed versions, but out of the box on Ubuntu, it simply outputs the literal text after the command. A backslash in this context is just a literal backslash, which again escapes the next character to ensure it is interpreted literally. Unlike the shell, sed simply removes this backslash.

In Bash, you can use $'...' "C-style" strings which let you encode a literal tab symbolically. However, you need to add literal backslashes in a couple of places: sed doesn't accept an unescaped literal newline in the s command, and the tab after a needs a backslash in order for it not to be skipped as insignificant whitespace.

printf '%s\n' abc 123 ghi |
sed -e $'s/abc/&\\\n\tdef/' \
    -e $'/ghi/a\\\tjkl'

To reiterate, inside $'...' a \t gets replaced by the shell with a literal tab character, \n with a newline, etc, before the string gets passed on to the command (sed in this case).

In the grand scheme of things, you may be better off using a less haphazard tool than sed for this. Awk is much easier to read, write, and debug.

  • Related