I have a column in my pandas dataframe which goes like this:
Size
19M
14M
160k
..
..
I now need to extract the numeric part from each cell, and if the previous value contains a "M", the numeric part needs to be multiplied by 1000. In Excel, I've used this formula
=IF(RIGHT(A2,1)="M",LEFT(A2,LEN(A2)-1)*1000,LEFT(A2,LEN(A2)-1))
How do I replicate this in pandas?
TIA!
CodePudding user response:
One way could be: Mask the rows where the previous row has a M
, then change the values conditionally with np.where
. fillna(False)
is for the first row, since there is no previous row to compare shift
will result in Nan
df = pd.DataFrame({"Size": {0: "19M",
1: "14M",
2: "160k",
3: "5M",
4: "17k",
5: "7M"}})
mask = df['Size'].shift().str.contains('M').fillna(False)
print(mask)
df['res'] = np.where(mask, df['Size'].str[:-1].astype(int)*1000, df['Size'].str[:-1].astype(int))
print(df)
#mask
0 False
1 True
2 True
3 False
4 True
5 False
Name: Size, dtype: bool
#df
Size res
0 19M 19
1 14M 14000
2 160k 160000
3 5M 5
4 17k 17000
5 7M 7
CodePudding user response:
Below solution uses LAG & REGEX
import pandas as pd
import re
df = pd.DataFrame({'col1':['19M','14M','160K','17M','100K']})
df['col1_lag'] = df['col1'].shift(1)
df[['col1','col1_lag']].apply(lambda x: int(re.sub("[^/0-9]",'',x[0])) * 100 if str(x[1]).endswith('M') else int(re.sub("[^/0-9]",'',x[0])), axis = 1)
Output: