I'm trying to build a regex pattern that can capture the following examples:
pattern1 = '.She is greatThis is annoyingWhy u do this'
pattern2 = '.Weirdly specificThis sentence is longer than the other oneSee this is great'
example = 'He went such dare good mr fact. The small own seven saved man age no offer. Suspicion did mrs nor furniture smallness. Scale whole downs often leave not eat. An expression reasonably cultivated indulgence mr he surrounded instrument. Gentleman eat and consisted are pronounce distrusts.This is where the fun startsSummer is really bothersome this yearShe is out of ideas'
example_pattern_goal = 'This is where the fun startsSummer is really bothersome this yearShe is out of ideas'
Essentially, it's always a dot followed by sentences of various length not including any numbers. I only want to capture these specific sentences, so I tried to capture instances where a dot was immediately followed by a word that starts with an uppercase and other words that include two instances where an uppercase letter is inside the word.
So far, I've only come up with the following regex that doesn't quite work:
'.\b[A-Z]\w [\s\w] \b\w [A-Z]\w \b[\s\w] \b\w [A-Z]\w \b[\s\w] '
CodePudding user response:
You can use
\.([A-Z][a-z]*(?:\s [A-Za-z] )*\s [a-zA-Z] [A-Z][a-z] (?:\s [A-Za-z] )*)
See the regex demo.
Details:
\.
- a dot[A-Z][a-z]*
- an ASCII word starting from an upper case letter(?:\s [A-Za-z] )*
- zero or more sequences of one or more whitespaces and then an ASCII word\s
- zero or more whitespaces[a-zA-Z] [A-Z][a-z]
- an ASCII word with an uppercase letter inside it(?:\s [A-Za-z] )*
- zero or more sequences of one or more whitespaces and then an ASCII word.