I'm trying to analyze search queries of a particular pattern.
The pattern is: How many/much _____ is/are _____.
Given this pattern, the blanks are unknown to me but I want to extract any statement that follows this pattern above. My challenge is finding a way to do a lookaround on is/are up to but not including many/much and anything after but not including is/are.
Here's my regex so far:
(([hH]ow many?)|([hH]ow much?))|(?<=is)|(are)|(i|s|n|a|o|f){1,2}|((\")|(\“)|(\/)|(\'))
CodePudding user response:
If you use this regex with the i
flag to match case insensitive
^how\s (?:much|many)\s (.*?)\s(?:is|are)\s (.*?)[.?]?$
Then it'll match these strings
How much bla is blabla.
How many bla are blablabla?
And the bla's will be in capture group 1 and 2.
CodePudding user response:
Try this:
/(?<=[Hh]ow\smany\s|[Hh]ow\smuch\s)(. )(?=\sis|\sare)|(?<=is\s|are\s)(. )/g
Review it at regex101
Lookarounds are placed behind and/or ahead of your capture group:
1st Capture Group
(?<=[Hh]ow\smany\s|[Hh]ow\smuch\s) /* "(H|h)ow"\space"many"\space OR
"(H|h)ow"\space"much"\space
must be before capture group */
(. ) /* capture group one or more of anything */
(?=\sis|\sare) /* \space"is" OR \space"are" must be after capture group */
| // OR
2nd Capture Group
(?<=is\s|are\s) /* "is"\space OR "are\space must be before capture group */
(. ) /* capture group one or more of anything */