Home > Net >  Remove rows from dataframe if A column value is in C column
Remove rows from dataframe if A column value is in C column

Time:09-08

I have this code:

test = {"number": ['1555','1666','1777', '1888'],
        "order_amount": ['100.00','200.00','-200.00', '300.00'],
        "number_of_refund": ['','','1666', '']
    }

df = pd.DataFrame(test)

Which returns the following dataframe:

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00                 

What would be the best solution to remove the row if the order number is refunded? I would want to remove the order row and the refund row.

Logic if df['number'].value is in df['number_of_refund'] and the amount of df['number'].value is the opposite of the df['number_of_refund'] rows.

So the result in this case should be:

number order_amount number_of_refund
0   1555       100.00                 
1   1888       300.00                 

CodePudding user response:

You can use boolean indexing with two masks:

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# is the number order matching a refund?
m2 = df['number'].isin(df.loc[m1, 'number_of_refund'])

# keep rows that do not match any of the above
out = df[~(m1|m2)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
3   1888         300.0                 

partial refunds:

df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']
# update amounts
df['order_amount']  = df['number'].map(s).fillna(0)
# find null values
m2 = df['order_amount'].eq(0)


# keep rows that do not match any mask
df = df[~(m1|m2)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
1   1666         100.0                 
3   1888         300.0                 

input for partial refund:

test = {"number": ['1555','1666','1777', '1888'],
        "order_amount": ['100.00','200.00','-100.00', '300.00'],
        "number_of_refund": ['','','1666', '']
    }

df = pd.DataFrame(test)
  • Related