Home > Blockchain >  Java Regex, CASE_INSENSITIVE, LITERAL plus whole word
Java Regex, CASE_INSENSITIVE, LITERAL plus whole word

Time:01-20

I am trying to delete/ replace whole words from a string.

I would like to do so case-insensitive and it should also work for special caracters, such as .,\ or /.

Do do so, I use the following code:

String result = Pattern.compile(stringToReplace, Pattern.LITERAL | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(inputString)
                    .replaceAll("");

Like this, it works for special characters and it is case insensitive.

I know that I can enable whole word matching by using "\b".

I could do the following:

String result = Pattern.compile("\\b" stringToReplace "\\b",  Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(inputString)
                    .replaceAll("");

This way it would match only whole words, but there would be problems for special characters. It interferes with Pattern.LITERAL. I need to disable this, which is not desired.

How can I combine Pattern.LITERAL with whole word matching?

CodePudding user response:

You must remember that the \b word boundary pattern is context dependent and matches between the start/end of string and a word char or between a word and a non-word char.

You need to use

String result = Pattern.compile("(?!\\B\\w)" Pattern.quote(stringToReplace) "(?<!\\w\\B)",  Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(inputString)
                    .replaceAll("");

There are two main changes:

  • The stringToReplace needs to be Pattern.quoted to make sure all special characters are escaped
  • Adaptive word boundaries will make sure the word boundary is only required when necessary, i.e. when the neighbouring chars are word chars. (?!\B\w) is a left-hand adaptive word boundary and the (?<!\w\B) is a right-hand adaptive word boundary. Actually, it appears that both can be used interchangeably due to the nature of the zero-width assertions and the word boundary pattern, but this notation is best from the logical point of view.
  • Related