Home > Software engineering >  Replace pandas dataframe substrings using regex based on condition at row level
Replace pandas dataframe substrings using regex based on condition at row level

Time:02-27

I need to find a way of replacing a substring in a pandas dataframe at row level based on condition. (df = pd.DataFrame({'Name':['Meat 1.7 Kg','Chicken 1.9 Kg','Ground Beef 1.0 Kg','Turkey 1.2 kg','Wagyu 400 g'],'Weight':[10,8,2,6,4],'Mult':[4.0,5.2,5.6,5.9,4.9]}))

    Name               Weight Mult
0   Meat 1.7 Kg        10     4.0
1   Chicken 1.9 Kg      8     5.2
2   Ground Beef 1.0 Kg  2     5.6
3   Turkey 1.2 kg       6     5.9
4   Wagyu 400 g         4     4.9

I need to replace the 'Name' column number with the 'Mult' column number. The problem is that I need to directly do so when it has 'Kg' or 'kg' at the end, but when it has a 'g' at the end I need to multiply by 1000 and then replace it.

Desired output:

    Name               Weight Mult
0   Meat 4.0 Kg        10     4.0
1   Chicken 5.2 Kg      8     5.2
2   Ground Beef 5.6 Kg  2     5.6
3   Turkey 5.9 kg       6     5.9
4   Wagyu 4900 g        4     4.9

Maybe it's better to just transform the 'Mult' value prior replacement and then use the same condition to all rows.

Currently I'm trying this: [re.sub(r'(\d \.\d )\s [Kk]g$|(\d \s )g$', b, a) for a, b in zip(df['refid'], df['Mult'].astype(str))], but I don't know how to introduce the condition.

Thanks.

CodePudding user response:

You can use

df['id'] = df.apply(lambda x: re.sub(r'(?i)(\d (?:\.\d )?)(\s*(k)?g)$', lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000)))   z.group(2), x['id']), axis=1)

Output of df:

                     id  Weight  Mult
0           Meat 4.0 Kg      10   4.0
1        Chicken 5.2 Kg       8   5.2
2  Grounded Beef 5.6 Kg       2   5.6
3         Turkey 5.9 kg       6   5.9
4          Wagyu 4900 g       4   4.9

Details:

  • (?i) - case insensitive matching on
  • (\d (?:\.\d )?) - Group 1: one or more digits and then an optional sequence of a . and one or more digits
  • (\s*(k)?g) - Group 2: zero or more whitespaces, an optional Group 3 capturing a k and then a g
  • $ - end of string.

The lambda z: (str(x['Mult']) if z.group(3) else str(int(x['Mult']*1000))) z.group(2) replacements means the match is replaced with the value from the Mult column as is if Group 3 matched the Group 2 value, or, if Group 3 did not match, the number is multiplied by 1000, turned to integer, converted to a string and returned with Group 2 value right after.

  • Related