I'm trying to find all words that begin with an upper case, unless they are at a start of a sentence.
so
It was in late July that that he found out. He had seen Tim
would return:
July, Tim
so far I've got
(?!<*[\\s])([A-Z][A-Za-z] )
but get "He" and "It" included.
CodePudding user response:
You can consider using a lookaround like
(?<![.?!]\s|^)[A-Z][A-Za-z]
Note this will match words of two or more ASCII letters. If one-letter words are to be found, too, replace
at the end with a *
quantifier.
If you plan to check for whole words only, add word boundaries, \b(?<![.?!]\s|^)[A-Z][A-Za-z]*\b
The (?<![.?!]\s|^)
is a negative lookbehind that matches a location that is not immediately prececed with a .
/ ?
/ !
and a whitespace, or start of string location.