Home > database >  Python regex : extract multiple pattern in series containing decimal
Python regex : extract multiple pattern in series containing decimal

Time:07-29

Say, I have a string

1. ACTNOWQUICK3 1234.56 1234.98 HYE912630964589376 PLUTO THEATRE OTHER WUN Cool Beans KIng
2. Cash WithdrawalATM 50.00 ABC 1111.22 23523455A
3. ACTNOWQUICK 76.53 653.24 HYE91234234589376 WiN OTHR JOHNKLING

I need to extract pattern from this such that, I get everything before the first numerical value, everything after it and also the two numerical values . Note that its guranteed that there will be only 2 numeric int/decimal values in the string with space before and after

this is what I have tried but its not giving me the expected output :

pattern = '(.*)([0-9]*[,.][0-9]*).*([0-9]*[,.][0-9]*)(.*)'

What was expected :

1. "ACTNOWQUICK3", 1234.56, 1234.98, "HYE912630964589376 PLUTO THEATRE OTHER WUN Cool Beans KIng"
2. "Cash WithdrawalATM", 50.00, 1111.22, "23523455A"
3. "ACTNOWQUICK", 76.53, 653.24, "HYE91234234589376 WiN OTHR JOHNKLING"

CodePudding user response:

You're using a greedy quantifier. As Michael recommends, just change the first two .* to lazy adding a ? after it. And add a white space in the first and last parenthesis.

pattern = '(.*?) ([0-9] [,.][0-9] ).*?([0-9] [,.][0-9] ) (.*)'

This works because you want to repeat the first patterns as few as possible.

Test here: https://regex101.com/r/PVR6bd/1

  • Related