I am getting a special character - '„' in our source files. we are using the following sed command to replace the '„' character to '&' using the below command line but the operation isn't successful.
cat File.txt | sed 's/\x2\xE1/&/g' > File_New.txt
HEX CODE for „ is #x201E;
CodePudding user response:
I think perl
is far superior than sed
at working with Unicode text (Assuming here that your file is encoded using UTF-8):
$ cat input.txt
foo „ bar
$ perl -CSD -pe 's/\N{U 201E}/&/g' input.txt
foo & bar
(-CSD
tells perl
that standard input/output/error and all opened files are using UTF-8)
But (with the appropriate locale) you can use sed
and a shell like bash
that implements ANSI-C quoting to generate the character:
$ sed 's/'$'\u201E''/\&/g' input.txt
foo & bar
or just including the codepoint's UTF-8 bytes directly instead of using an escape sequence will typically work too:
$ sed 's/„/\&/g' input.txt
foo & bar
Some versions of sed
, like the GNU one, support \xHH
to represent a byte with the given hexadecimal value, but the Unicode codepoint U 201E is not encoded with those bytes in UTF-8; instead it's the three byte sequence E2 80 9E
$ sed 's/\xE2\x80\x9E/\&/g' input.txt
foo & bar
All the sed
examples escape the &
in the replacement because without the backslash before it, &
is replaced by the matched text, leaving you right back where you started from.
CodePudding user response:
With a recent GNU sed
:
$ printf '\u201E\n' | hd
00000000 e2 80 9e 0a |....|
00000004
$ printf 'a\u201Eb\n' | LC_ALL=C sed 's/\xe2\x80\x9e/\&/g'
a&b
Documentation: info sed -n 'Locale Considerations'
Since &
is a meta character in the replacement section of sed
's s
command it must be escaped.