Home > Back-end >  Why sed doesn't replace NULL-characters \x0?
Why sed doesn't replace NULL-characters \x0?

Time:12-19

If I want to replace several lines, for example in a file or in STDIN, and I don't know the numbers of the lines that occur in the file or in STDIN, I can turn the whole flow into one line, for example with tr, like this:

$ printf "%s\n" aaa bbb ccc ddd | tr '\n' '\0' | sed -e 's#bbb\x0ccc\x0ddd#string2\x0string3\x0string4#g' | tr '\0' '\n'

aaa
bbb
ccc
ddd

I want to get that conclusion in this case:

aaa
string2
string3
string4

Note that this is a test example, in the real case I do not know the numbers of the lines in which to make the substitution. I only know the rows that need to be replaced and the rows that need to be replaced.

As far as I can see, sed can replace NULL-characters, example:

printf "%s\n" aaa bbb ccc ddd | tr '\n' '\0' | sed -e 's#\x0#\n#g'
aaa
bbb
ccc
ddd

Why doesn't it happen in the first case?

You can try to replace it with a regular expression - (.*) instead of \x0, but with different input data, it will make the substitution wrong, as in the example below:

$printf "%s\n" aaa bbb ccc ddd bbb ddd | tr '\n' '\0' | sed -e 's#bbb\(.*\)ccc\(.*\)ddd#string2\1string3\2string4#g' | tr '\0' '\n'

aaa
string2
string3
ddd
bbb
string4

Can you please tell me how to correctly replace multiple lines? Thank you for your help!

CodePudding user response:

The problem seems to be that the \x escapes consumes more than just the 1 zero. Consider that in \x0c, both 0 and c are valid hexadecimal digits.

The hex escapes work differently depending on language. E.g., in C they're super greedy (will consume all valid hex digits that they can). A saner \x escape for non-wide strings would consume exactly two digits (so as to fill an 8-bit byte). Sed's version seems to work like that.

Experimentally, replacing \x0 with \x00 works:

printf "%s\n" aaa bbb ccc ddd | tr '\n' '\0' | sed -e 's#bbb\x00ccc\x00ddd#string2\x00string3\x00string4#g' | tr '\0' '\n'
  • Related