Why sed doesn't replace NULL-characters \x0?-CodePudding

If I want to replace several lines, for example in a file or in STDIN, and I don't know the numbers of the lines that occur in the file or in STDIN, I can turn the whole flow into one line, for example with tr, like this:

$ printf "%s\n" aaa bbb ccc ddd | tr '\n' '\0' | sed -e 's#bbb\x0ccc\x0ddd#string2\x0string3\x0string4#g' | tr '\0' '\n'

aaa
bbb
ccc
ddd

I want to get that conclusion in this case:

aaa
string2
string3
string4

Note that this is a test example, in the real case I do not know the numbers of the lines in which to make the substitution. I only know the rows that need to be replaced and the rows that need to be replaced.

As far as I can see, sed can replace NULL-characters, example:

printf "%s\n" aaa bbb ccc ddd | tr '\n' '\0' | sed -e 's#\x0#\n#g'
aaa
bbb
ccc
ddd

Why doesn't it happen in the first case?

You can try to replace it with a regular expression - (.*) instead of \x0, but with different input data, it will make the substitution wrong, as in the example below:

$printf "%s\n" aaa bbb ccc ddd bbb ddd | tr '\n' '\0' | sed -e 's#bbb\(.*\)ccc\(.*\)ddd#string2\1string3\2string4#g' | tr '\0' '\n'

aaa
string2
string3
ddd
bbb
string4

Can you please tell me how to correctly replace multiple lines? Thank you for your help!

CodePudding user response：

The problem seems to be that the \x escapes consumes more than just the 1 zero. Consider that in \x0c, both 0 and c are valid hexadecimal digits.

The hex escapes work differently depending on language. E.g., in C they're super greedy (will consume all valid hex digits that they can). A saner \x escape for non-wide strings would consume exactly two digits (so as to fill an 8-bit byte). Sed's version seems to work like that.

Experimentally, replacing \x0 with \x00 works:

printf "%s\n" aaa bbb ccc ddd | tr '\n' '\0' | sed -e 's#bbb\x00ccc\x00ddd#string2\x00string3\x00string4#g' | tr '\0' '\n'