Regex - Count number of uppercase letters in word-CodePudding

I would like to find all words in a text that have more than one uppercase letter. So far, I am checking only if the last character is uppercase

\b.*[A-Z]\b

but it would be more precise if I had the condition that the last letter or in total two characters in the word are uppercase.

CodePudding user response：

You can use

re.findall(r'\b(?:[a-z]*[A-Z]){2}[a-zA-Z]*\b', text)

See the regex demo. Details:

\b - a word boundary
(?:[a-z]*[A-Z]){2} - two sequences of zero or more lowercase letters followed with an uppercase letter
[a-zA-Z]* - zero or more ASCII letters
\b - a word boundary

See the Python demo:

import re
text = "A VeRy LoNG SenTence Here"
print(re.findall(r'\b(?:[a-z]*[A-Z]){2}[a-zA-Z]*\b', text))
# => ['VeRy', 'LoNG', 'SenTence']

A fully Unicode-aware regex is possible with the PyPi regex library (install in your terminal/console with pip install regex):

import regex
text = "Да, ЭтО ОченЬ ДЛинное предложение."
print(regex.findall(r'\b(?:\p{Ll}*\p{Lu}){2}\p{L}*\b', text))
# => ['ЭтО', 'ОченЬ', 'ДЛинное']

See this Python demo.

CodePudding user response：

\b(\w*[A-Z]\w*[A-Z]\w*|.*[A-Z])\b

explanation: this will match either, any word with upper case at the end (your regex has been reused here) - OR - a string of zero or more word chars (\w), followed by a single uppercase, followed by a string of zero or more word chars (\w), followed by a single uppercase and finally another zero or more word chars. The \w is shorthand for [A-Za-z0-9_]