Suppose there is the following dataframe:
import pandas as pd
df = pd.DataFrame({'Group': ['A', 'A', 'B', 'B', 'C', 'C'], 'Value': [1, 2, 3, 4, 5, 6]})
I would like to subtract the values from group B and C with those of group A and make a new column with the difference. That is, I would like to do something like this:
df[df['Group'] == 'B']['Value'].reset_index() - df[df['Group'] == 'A']['Value'].reset_index()
df[df['Group'] == 'C']['Value'].reset_index() - df[df['Group'] == 'A']['Value'].reset_index()
and place the result in a new column. Is there a way of doing it without a for loop?
CodePudding user response:
Assuming you want to subtract the first A to the first B/C, second A to second B/C, etc. the easiest might be to reshape:
df2 = (df
.assign(cnt=df.groupby('Group').cumcount())
.pivot('cnt', 'Group', 'Value')
)
# Group A B C
# cnt
# 0 1 3 5
# 1 2 4 6
df['new_col'] = df2.sub(df2['A'], axis=0).melt()['value']
variant:
df['new_col'] = (df
.assign(cnt=df.groupby('Group').cumcount())
.groupby('cnt', group_keys=False)
.apply(lambda d: d['Value'].sub(d.loc[d['Group'].eq('A'), 'Value'].iloc[0]))
)
output:
Group Value new_col
0 A 1 0
1 A 2 0
2 B 3 2
3 B 4 2
4 C 5 4
5 C 6 4