I have a pandas dataframe and i need to subtract rows if they have the same group.
Input Dataframe:
A | B | Value |
---|---|---|
A1 | B1 | 10.0 |
A1 | B1 | 5.0 |
A1 | B2 | 5.0 |
A2 | B1 | 3.0 |
A2 | B1 | 5.0 |
A2 | B2 | 1.0 |
Expected Dataframe:
A | B | Value |
---|---|---|
A1 | B1 | 5.0 |
A1 | B2 | 5.0 |
A2 | B1 | -2.0 |
A2 | B2 | 1.0 |
Logic: For example the first and second rows of the dataframe are in group A1 and B1 so the value must be 10.0 - 5.0 = 5.0. 4º and 5º rows have the same group as well so the value must be 3.0 - 5.0 = -2.0.
Only subtract rows if they have same A and B value.
Thank you!
CodePudding user response:
Try:
subtract = lambda x: x.iloc[0] - (x.iloc[1] if len(x) == 2 else 0)
out = df.groupby(['A', 'B'])['Value'].apply(subtract).reset_index()
print(out)
# Output:
A B Value
0 A1 B1 5.0
1 A1 B2 5.0
2 A2 B1 -2.0
3 A2 B2 1.0
CodePudding user response:
You can prepare duplicated rows to be substracted and then sum after grouping. This works for more than one duplicated row in the correct order, too.
import pandas as pd
df = pd.read_html('https://stackoverflow.com/q/70438208/14277722')[0]
df.loc[df.duplicated(subset=['A','B']), 'Value'] *=-1
df.groupby(['A','B'], as_index=False).sum()
Output
A B Value
0 A1 B1 5.0
1 A1 B2 5.0
2 A2 B1 -2.0
3 A2 B2 1.0
CodePudding user response:
Let us pass the condition within groupby
apply
out = df.groupby(['A','B'])['Value'].apply(lambda x : x.iloc[0]-x.iloc[-1] if len(x)>1 else x.iloc[0]).reset_index(name = 'Value')
Out[18]:
A B Value
0 A1 B1 5.0
1 A1 B2 5.0
2 A2 B1 -2.0
3 A2 B2 1.0