Home > OS >  regex split on uppercase, but ignore titlecase
regex split on uppercase, but ignore titlecase

Time:12-14

How can I split This Is ABC Title into This Is, ABC, Title in Python? If is use [A-Z] as regex expression it will be split into This, Is, ABC, Title? I do not want to split on whitespace.

CodePudding user response:

You can use

re.split(r'\s*\b([A-Z] )\b\s*', text)

Details:

  • \s* - zero or more whitespaces
  • \b - word boundary
  • ([A-Z] ) - Capturing group 1: one or more ASCII uppercase letters
  • \b - word boundary([A-Z] )
  • \s* - zero or more whitespaces

Note the use of capturing group that makes re.split also output the captured substring.

See the Python demo:

import re
text = "This Is ABC Title"
print( re.split(r'\s*\b([A-Z] )\b\s*', text) )
# => ['This Is', 'ABC', 'Title']
  • Related