I tried to extract all consecutive capitalized words in a given string written with no spacing in between.
E.g. The University Of Sydney
=> TheUniversityOfSydney
, Regular Expression
=> RegularExpression
, and This Is A Simple Variable
=> ThisIsASimpleVariable
.
I start with this code, but it comes as a list:
import re
string = "I write a syntax of Regular Expression"
result = re.findall(r"\b[A-Z][a-z]*\b", string)
print(result)
I expect to get RegularExpression
here.
CodePudding user response:
You need to use
import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s [A-Z]\w*) "
result = ["".join(x.split()) for x in re.findall(rx, text)]
print(result) # => ['RegularExpression']
See the Python demo.
The regex is explained in How can I use Regex to abbreviate words that all start with a capital letter.
In this case, the regex is used in re.findall
to extract matches, and "".join(x.split())
is a post-process step to remove all whitespaces from the found texts.
If you only expect one single match in each string, use re.search
:
import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s [A-Z]\w*) "
result = re.search(rx, text)
if result:
print( "".join(result.group().split()) ) # => 'RegularExpression'