Home > Software engineering >  Group and remove rows if amount of grouped rows is equal to zero
Group and remove rows if amount of grouped rows is equal to zero

Time:09-15

I have this code:

test = {"number": ['1333','1444','1555','1666','1777', '1444', '1999', '2000', '2000'],
        "order_amount": ['1000.00','-500.00','100.00','200.00','-200.00','-500.00','150.00','-100.00','-20.00'],
        "number_of_refund": ['','1333','', '', '1666', '1333', '','1999','1999']
        }

df = pd.DataFrame(test) Which returns the following dataframe:

  number order_amount number_of_refund
0   1333      1000.00                 
1   1444      -500.00             1333
2   1555       100.00                 
3   1666       200.00                 
4   1777      -200.00             1666
5   1444      -500.00             1333
6   1999       150.00                 
7   2000      -100.00             1999
8   2000       -20.00             1999

From which I need to remove the orders that have "number" value inside "number_of_refund" column if the amount of "number_of_refund" records (can be multiple) are the negative amount of the "number" that is specified in "number_of_refund". So the outcome should be:

  number order_amount number_of_refund
2   1555       100.00                 
6   1999       150.00                 
7   2000      -100.00             1999
8   2000       -20.00             1999

CodePudding user response:

You can do the same as in my previous answer, but use an intermediate dataframe with aggregated refunds:

# ensure numeric values
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')

# aggregate values
df2 = df.groupby(['number', 'number_of_refund'], as_index=False).sum()

# is the row a refund?
m1 = df2['number_of_refund'].ne('')
# get mapping of refunds
s = df2[m1].set_index('number_of_refund')['order_amount']

# get reimbursements and find which ones will equal the original value
reimb = df['number'].map(s)
m2 = reimb.eq(-df['order_amount'])
m3 = df['number_of_refund'].isin(df.loc[m2, 'number'])

# keep rows that do not match any m2 or m3 mask
df = df[~(m2|m3)]

output:

  number  order_amount number_of_refund
2   1555         100.0                 
6   1999         150.0                 
7   2000        -100.0             1999
8   2000         -20.0             1999
  • Related