Home > other >  Subtract values in a column in blocks
Subtract values in a column in blocks

Time:05-11

Suppose there is the following dataframe:

import pandas as pd
    
df = pd.DataFrame({'Group': ['A', 'A', 'B', 'B', 'C', 'C'], 'Value': [1, 2, 3, 4, 5, 6]})

I would like to subtract the values from group B and C with those of group A and make a new column with the difference. That is, I would like to do something like this:

df[df['Group'] == 'B']['Value'].reset_index() - df[df['Group'] == 'A']['Value'].reset_index() 
df[df['Group'] == 'C']['Value'].reset_index() - df[df['Group'] == 'A']['Value'].reset_index() 

and place the result in a new column. Is there a way of doing it without a for loop?

CodePudding user response:

Assuming you want to subtract the first A to the first B/C, second A to second B/C, etc. the easiest might be to reshape:

df2 = (df
 .assign(cnt=df.groupby('Group').cumcount())
 .pivot('cnt', 'Group', 'Value')
)
# Group  A  B  C
# cnt           
# 0      1  3  5
# 1      2  4  6

df['new_col'] = df2.sub(df2['A'], axis=0).melt()['value']

variant:

df['new_col'] = (df
 .assign(cnt=df.groupby('Group').cumcount())
 .groupby('cnt', group_keys=False)
 .apply(lambda d: d['Value'].sub(d.loc[d['Group'].eq('A'), 'Value'].iloc[0]))
)

output:

  Group  Value  new_col
0     A      1        0
1     A      2        0
2     B      3        2
3     B      4        2
4     C      5        4
5     C      6        4
  • Related