Home > Back-end >  Pandas with regex
Pandas with regex

Time:04-21

I have a column in my dataframe were there will be multiple values. I need to filter only values that match my condition.

For Example

df

col1
Tesla
Audi
BMW-N2204281200PE
SUPRA2204241300.75CE
TATA230612133.50PE

I need to filter only the values like the last 3 rows. It will be a string that will be starting with characters, may have symbols(-,&,$) followed by characters ,will have 6 digit value, then some price like 1300,1300.75, and ends with PE or CE

How could I do this using pandas? Also how could I split the same symbol like ['BMW-N','220428',1200PE], ['SUPRA','220424','1300.75CE' ] ?

CodePudding user response:

You can use the following regex:

df['col1'].str.extract('([a-zA-Z-&$] )(\d{6})(\d (?:\.\d )?[PC]E)')

output:

       0       1          2
0    NaN     NaN        NaN
1    NaN     NaN        NaN
2  BMW-N  220428     1200PE
3  SUPRA  220424  1300.75CE
4   TATA  230612   133.50PE

regex demo

  • Related