Home > Net >  Using conditional formatting in pandas using lambda
Using conditional formatting in pandas using lambda

Time:07-11

I have a column in my pandas dataframe which goes like this:

Size 
19M
14M
160k
..
..

I now need to extract the numeric part from each cell, and if the previous value contains a "M", the numeric part needs to be multiplied by 1000. In Excel, I've used this formula

=IF(RIGHT(A2,1)="M",LEFT(A2,LEN(A2)-1)*1000,LEFT(A2,LEN(A2)-1))

How do I replicate this in pandas?

TIA!

CodePudding user response:

One way could be: Mask the rows where the previous row has a M, then change the values conditionally with np.where. fillna(False) is for the first row, since there is no previous row to compare shift will result in Nan

df = pd.DataFrame({"Size": {0: "19M", 
                            1: "14M", 
                            2: "160k", 
                            3: "5M", 
                            4: "17k", 
                            5: "7M"}})

mask = df['Size'].shift().str.contains('M').fillna(False)
print(mask)

df['res'] = np.where(mask, df['Size'].str[:-1].astype(int)*1000, df['Size'].str[:-1].astype(int))
print(df)

#mask
0    False
1     True
2     True
3    False
4     True
5    False
Name: Size, dtype: bool

#df
   Size     res
0   19M      19
1   14M   14000
2  160k  160000
3    5M       5
4   17k   17000
5    7M       7

CodePudding user response:

Below solution uses LAG & REGEX

import pandas as pd
import re

df = pd.DataFrame({'col1':['19M','14M','160K','17M','100K']})

df['col1_lag'] = df['col1'].shift(1)

df[['col1','col1_lag']].apply(lambda x: int(re.sub("[^/0-9]",'',x[0])) * 100 if str(x[1]).endswith('M') else int(re.sub("[^/0-9]",'',x[0])), axis = 1)

Output:

enter image description here

  • Related