Home > Software design >  How can I use Regex to extract all words that written in the camel case
How can I use Regex to extract all words that written in the camel case

Time:03-08

I tried to extract all consecutive capitalized words in a given string written with no spacing in between.

E.g. The University Of Sydney => TheUniversityOfSydney, Regular Expression => RegularExpression, and This Is A Simple Variable => ThisIsASimpleVariable.

I start with this code, but it comes as a list:

import re
string = "I write a syntax of Regular Expression"
result = re.findall(r"\b[A-Z][a-z]*\b", string)
print(result)

I expect to get RegularExpression here.

CodePudding user response:

You need to use

import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s [A-Z]\w*) "
result = ["".join(x.split()) for x in re.findall(rx, text)]
print(result) # => ['RegularExpression']

See the Python demo.

The regex is explained in How can I use Regex to abbreviate words that all start with a capital letter.

In this case, the regex is used in re.findall to extract matches, and "".join(x.split()) is a post-process step to remove all whitespaces from the found texts.

If you only expect one single match in each string, use re.search:

import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s [A-Z]\w*) "
result = re.search(rx, text)
if result:
    print( "".join(result.group().split()) ) # => 'RegularExpression'
  • Related