I am trying to delete/ replace whole words from a string.
I would like to do so case-insensitive and it should also work for special caracters, such as .
,\
or /
.
Do do so, I use the following code:
String result = Pattern.compile(stringToReplace, Pattern.LITERAL | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(inputString)
.replaceAll("");
Like this, it works for special characters and it is case insensitive.
I know that I can enable whole word matching by using "\b".
I could do the following:
String result = Pattern.compile("\\b" stringToReplace "\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(inputString)
.replaceAll("");
This way it would match only whole words, but there would be problems for special characters. It interferes with Pattern.LITERAL. I need to disable this, which is not desired.
How can I combine Pattern.LITERAL with whole word matching?
CodePudding user response:
You must remember that the \b
word boundary pattern is context dependent and matches between the start/end of string and a word char or between a word and a non-word char.
You need to use
String result = Pattern.compile("(?!\\B\\w)" Pattern.quote(stringToReplace) "(?<!\\w\\B)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(inputString)
.replaceAll("");
There are two main changes:
- The
stringToReplace
needs to bePattern.quote
d to make sure all special characters are escaped - Adaptive word boundaries will make sure the word boundary is only required when necessary, i.e. when the neighbouring chars are word chars.
(?!\B\w)
is a left-hand adaptive word boundary and the(?<!\w\B)
is a right-hand adaptive word boundary. Actually, it appears that both can be used interchangeably due to the nature of the zero-width assertions and the word boundary pattern, but this notation is best from the logical point of view.