I would like to grep for the exact match of "er", but grep -w finds a partial match in words with non-Latin letters such as "ß" in addition to the exact match. The command below finds "er" in "großer", and "weißer". The expected behavior is that grep only finds the exact match of "er" in the string below with no partial matches.
echo "großer, Teller, der, er, weißer" | grep -w "er"
I also tried exporting LC_ALL=C
, but this did not solve the problem.
CodePudding user response:
If you have a GNU grep
, you can use
grep -oP "(*UCP)\ber\b"
grep -P "(*UCP)\ber\b"
The (*UCP)
PCRE verb will make \b
, word boundary pattern, fully Unicode-aware.
With pcregrep
, you can also use this approach, but you need to specify the -u
option:
pcregrep -ou '(*UCP)\ber\b'
pcregrep -u '(*UCP)\ber\b'
-u, --utf-8 use UTF-8 mode
The o
option is used to extract matches rather than printing the whole line where the match was found.