Home > Software design >  Regular expression matching strings where it contains a specific word that has a period
Regular expression matching strings where it contains a specific word that has a period

Time:05-20

I am trying to write a regular expression that matches string that contain a certain word with a period for example (apple. or grape.). I got it to work without the period but not quite sure how to get it to work when there is a period in the word.

What I tried:

(?i)\b(Apple|Grape)\b (Working correctly without the period)
(?i)\b(Apple\.|Grape\.)\b (Returns no matches)

Sample strings that should work:

1 apple.
1 Apple.
apple. 2
grape. 1
test grape.
grape. test
this is a Apple. test

Sample strings that should not work:

1apple.
1Apple.
apple.2
grape.1
testgrape.
grape.test
longwordApple.test
this is a Apple.test

CodePudding user response:

You could write the pattern as:

\b(Apple|Grape)\.(?!\S)

Explanation

  • \b A word boundary to prevent a partial word match on the left
  • (Apple|Grape) Capture either Apple or Grape
  • \. Match a dot
  • (?!\S) Assert a whitespace boundary to the right

Regex demo

In Java with the double escaped backslashes:

String regex = "(?<!\\S)(Apple|Grape)\\.(?!\\S)";

CodePudding user response:

Are you sure you need regexes? I mean I can think of a normal tokenization parser on the following lines:

String s = "";
foreach(String token : s.split(" ")) { // I suggest saving the split (apparently, javac does not do a good job at caching like the C/C   compiler)
    if(token.equals("apple.") || token.equals("grapes.")) { // or take in a word array with all matches and then run it over all those (n^2 complexity)
        
        //whatever you wanna do
    }
}
  • Related