I need to find a way of replacing a substring in a pandas dataframe at row level based on condition. (df = pd.DataFrame({'Name':['Meat 1.7 Kg','Chicken 1.9 Kg','Ground Beef 1.0 Kg','Turkey 1.2 kg','Wagyu 400 g'],'Weight':[10,8,2,6,4],'Mult':[4.0,5.2,5.6,5.9,4.9]})
)
Name Weight Mult
0 Meat 1.7 Kg 10 4.0
1 Chicken 1.9 Kg 8 5.2
2 Ground Beef 1.0 Kg 2 5.6
3 Turkey 1.2 kg 6 5.9
4 Wagyu 400 g 4 4.9
I need to replace the 'Name'
column number with the 'Mult'
column number. The problem is that I need to directly do so when it has 'Kg' or 'kg' at the end, but when it has a 'g' at the end I need to multiply by 1000 and then replace it.
Desired output:
Name Weight Mult
0 Meat 4.0 Kg 10 4.0
1 Chicken 5.2 Kg 8 5.2
2 Ground Beef 5.6 Kg 2 5.6
3 Turkey 5.9 kg 6 5.9
4 Wagyu 4900 g 4 4.9
Maybe it's better to just transform the 'Mult'
value prior replacement and then use the same condition to all rows.
Currently I'm trying this: [re.sub(r'(\d \.\d )\s [Kk]g$|(\d \s )g$', b, a) for a, b in zip(df['refid'], df['Mult'].astype(str))]
, but I don't know how to introduce the condition.
Thanks.
CodePudding user response:
You can use
df['id'] = df.apply(lambda x: re.sub(r'(?i)(\d (?:\.\d )?)(\s*(k)?g)$', lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) z.group(2), x['id']), axis=1)
Output of df
:
id Weight Mult
0 Meat 4.0 Kg 10 4.0
1 Chicken 5.2 Kg 8 5.2
2 Grounded Beef 5.6 Kg 2 5.6
3 Turkey 5.9 kg 6 5.9
4 Wagyu 4900 g 4 4.9
Details:
(?i)
- case insensitive matching on(\d (?:\.\d )?)
- Group 1: one or more digits and then an optional sequence of a.
and one or more digits(\s*(k)?g)
- Group 2: zero or more whitespaces, an optional Group 3 capturing ak
and then ag
$
- end of string.
The lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) z.group(2)
replacements means the match is replaced with the value from the Mult column as is if Group 3 matched the Group 2 value, or, if Group 3 did not match, the number is multiplied by 1000, turned to integer, converted to a string and returned with Group 2 value right after.