I'm working on a script that helps us to split a text with a specific pattern (a comma " " capital letter). So, I do the regex expression: patt = re.compile(r'\b, [A-Z]') Then, I split the text that follows this pattern, when I did it, the capital letter was excluded and it's not the expected result that I'm looking for.
Example:
target_string = 'Prueba1, palabra 1, Palabra 2, palabra 3, palabra 4, Palabra5 frase1'
patt = re.compile(r'\b, [A-Z]')
print(patt.split(target_string))
Result: ['Prueba1, palabra 1', **'alabra 2**, palabra 3, palabra 4', **'alabra5 frase1'**]
Expected result: ['Prueba1, palabra 1', 'Palabra 2, palabra 3, palabra 4', 'Palabra5 frase1']
I hope you can help me to fix my script.
CodePudding user response:
You need to have the capital letter be a "lookahead" -- a requirement that is not included in the returned match string.
patt = re.compile(r'\b, (?=[A-Z])')