I have this code:
test = {"number": ['1555','1666','1777', '1888'],
"order_amount": ['100.00','200.00','-200.00', '300.00'],
"number_of_refund": ['','','1666', '']
}
df = pd.DataFrame(test)
Which returns the following dataframe:
number order_amount number_of_refund
0 1555 100.00
1 1666 200.00
2 1777 -200.00 1666
3 1888 300.00
What would be the best solution to remove the row if the order number is refunded? I would want to remove the order row and the refund row.
Logic if df['number'].value is in df['number_of_refund'] and the amount of df['number'].value is the opposite of the df['number_of_refund'] rows.
So the result in this case should be:
number order_amount number_of_refund
0 1555 100.00
1 1888 300.00
CodePudding user response:
You can use boolean indexing with two masks:
# is the row a refund?
m1 = df['number_of_refund'].ne('')
# is the number order matching a refund?
m2 = df['number'].isin(df.loc[m1, 'number_of_refund'])
# keep rows that do not match any of the above
out = df[~(m1|m2)]
output:
number order_amount number_of_refund
0 1555 100.0
3 1888 300.0
partial refunds:
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')
# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']
# update amounts
df['order_amount'] = df['number'].map(s).fillna(0)
# find null values
m2 = df['order_amount'].eq(0)
# keep rows that do not match any mask
df = df[~(m1|m2)]
output:
number order_amount number_of_refund
0 1555 100.0
1 1666 100.0
3 1888 300.0
input for partial refund:
test = {"number": ['1555','1666','1777', '1888'],
"order_amount": ['100.00','200.00','-100.00', '300.00'],
"number_of_refund": ['','','1666', '']
}
df = pd.DataFrame(test)