I am trying to write a regular expression that matches string that contain a certain word with a period for example (apple. or grape.). I got it to work without the period but not quite sure how to get it to work when there is a period in the word.
What I tried:
(?i)\b(Apple|Grape)\b (Working correctly without the period)
(?i)\b(Apple\.|Grape\.)\b (Returns no matches)
Sample strings that should work:
1 apple.
1 Apple.
apple. 2
grape. 1
test grape.
grape. test
this is a Apple. test
Sample strings that should not work:
1apple.
1Apple.
apple.2
grape.1
testgrape.
grape.test
longwordApple.test
this is a Apple.test
CodePudding user response:
You could write the pattern as:
\b(Apple|Grape)\.(?!\S)
Explanation
\b
A word boundary to prevent a partial word match on the left(Apple|Grape)
Capture either Apple or Grape\.
Match a dot(?!\S)
Assert a whitespace boundary to the right
In Java with the double escaped backslashes:
String regex = "(?<!\\S)(Apple|Grape)\\.(?!\\S)";
CodePudding user response:
Are you sure you need regexes? I mean I can think of a normal tokenization parser on the following lines:
String s = "";
foreach(String token : s.split(" ")) { // I suggest saving the split (apparently, javac does not do a good job at caching like the C/C compiler)
if(token.equals("apple.") || token.equals("grapes.")) { // or take in a word array with all matches and then run it over all those (n^2 complexity)
//whatever you wanna do
}
}