Home > Enterprise >  Issue splitting a text with a pattern
Issue splitting a text with a pattern

Time:05-12

I'm working on a script that helps us to split a text with a specific pattern (a comma " " capital letter). So, I do the regex expression: patt = re.compile(r'\b, [A-Z]') Then, I split the text that follows this pattern, when I did it, the capital letter was excluded and it's not the expected result that I'm looking for.

Example:

target_string = 'Prueba1, palabra 1, Palabra 2, palabra 3, palabra 4, Palabra5 frase1'

patt = re.compile(r'\b, [A-Z]')
print(patt.split(target_string))
Result: ['Prueba1, palabra 1', **'alabra 2**, palabra 3, palabra 4', **'alabra5 frase1'**]
Expected result: ['Prueba1, palabra 1', 'Palabra 2, palabra 3, palabra 4', 'Palabra5 frase1']

I hope you can help me to fix my script.

CodePudding user response:

You need to have the capital letter be a "lookahead" -- a requirement that is not included in the returned match string.

patt = re.compile(r'\b, (?=[A-Z])')
  • Related