I have the dataframe below:
import pandas as pd
d = {'id': [1,1,2,2,3,3,3,4,4],
'amount':[2,3,5,-5,2,3,-5,3,4]}
d = pd.DataFrame(d)
d
id amount
0 1 2
1 1 3
2 2 5
3 2 -5
4 2 4
5 2 6
6 3 2
7 3 3
8 3 -5
9 4 3
10 4 4
11 4 100
12 4 -100
So what i want to do, is match the positive with the negative values per id and i do this:
d[d.groupby(['id'])['amount'].transform('sum').eq(0)
which gives me this output:
id amount
2 2 5
3 2 -5
4 3 2
5 3 3
6 3 -5
So as you can see my code does not only matches pairs of rows as i want. For instance, for id 3, it matches three rows on the condition i've set. How can i get it to only match pairs of rows and have the desired output as the one below:
id amount
2 2 5
3 2 -5
11 4 100
12 4 -100
CodePudding user response:
One way would be to check if element's negative exists in its own group:
def refunded(data):
return data[data["amount"].isin(-data["amount"])]
new_df = df.groupby("id").apply(refunded).reset_index(0, drop=True)
print(new_df)
Output:
id amount
2 2 5
3 2 -5
11 4 100
12 4 -100