I have this code:
test = {"number": ['1333','1444','1555','1666','1777', '1444', '1999', '2000', '2000'],
"order_amount": ['1000.00','-500.00','100.00','200.00','-200.00','-500.00','150.00','-100.00','-20.00'],
"number_of_refund": ['','1333','', '', '1666', '1333', '','1999','1999']
}
df = pd.DataFrame(test) Which returns the following dataframe:
number order_amount number_of_refund
0 1333 1000.00
1 1444 -500.00 1333
2 1555 100.00
3 1666 200.00
4 1777 -200.00 1666
5 1444 -500.00 1333
6 1999 150.00
7 2000 -100.00 1999
8 2000 -20.00 1999
From which I need to remove the orders that have "number" value inside "number_of_refund" column if the amount of "number_of_refund" records (can be multiple) are the negative amount of the "number" that is specified in "number_of_refund". So the outcome should be:
number order_amount number_of_refund
2 1555 100.00
6 1999 150.00
7 2000 -100.00 1999
8 2000 -20.00 1999
CodePudding user response:
You can do the same as in my previous answer, but use an intermediate dataframe with aggregated refunds:
# ensure numeric values
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')
# aggregate values
df2 = df.groupby(['number', 'number_of_refund'], as_index=False).sum()
# is the row a refund?
m1 = df2['number_of_refund'].ne('')
# get mapping of refunds
s = df2[m1].set_index('number_of_refund')['order_amount']
# get reimbursements and find which ones will equal the original value
reimb = df['number'].map(s)
m2 = reimb.eq(-df['order_amount'])
m3 = df['number_of_refund'].isin(df.loc[m2, 'number'])
# keep rows that do not match any m2 or m3 mask
df = df[~(m2|m3)]
output:
number order_amount number_of_refund
2 1555 100.0
6 1999 150.0
7 2000 -100.0 1999
8 2000 -20.0 1999