Home > front end >  How to drop rows based on condition?
How to drop rows based on condition?

Time:09-17

I want to find:

  1. data not found in provider, but found in accounting,
  2. data found in accounting, but not in provider
  3. data found in both, accounting and provider, but the amounts do not match.

To accomplish this I have to use three provided tables Accounting, Helper, Provider First I join Accounting with Helper on invoice. Then I join it with Provider on toID. After merge inner join I have a table which in this case creates two unnecessary records.

Accounting table looks like:

doc_nr amount invoice
a1 -9.17 i1
a2 9.17 i1

Helper table:

toID invoice
toid1 i1

Provider table:

toID tID DATE tAmount
toid1 t1 2021-01-18 9.17
toid1 t2 2021-01-19 -9.17

After joining three tables using merge inner join I got a table:

doc_nr amount invoice toID tID tAmount DATE
a1 -9.17 i1 toid1 t1 9.17 2021-01-18
a1 -9.17 i1 toid1 t2 -9.17 2021-01-19
a2 9.17 i1 toid1 t1 9.17 2021-01-18
a2 9.17 i1 toid1 t2 -9.17 2021-01-19

Expected result should be only 2 records instead of 4:

doc_nr amount invoice toID tID tAmount DATE
a1 -9.17 i1 toid1 t2 -9.17 2021-01-19
a2 9.17 i1 toid1 t1 9.17 2021-01-18

Can't think of a way how to filter the table to get expected results.

Tried dropping duplicates based on doc_nr, but results gives:

doc_nr amount invoice toID tID tAmount DATE
a1 -9.17 i1 toid1 t2 -9.17 2021-01-19
a2 9.17 i1 toid1 t2 -9.17 2021-01-18

It should have different tID; doc_nr and amount match tAmount. After join it seems it duplicates doc_nr creating it with other`s doc_nr and tID.

Any ideas on how to clean these duplicating rows and reach the expected result?

CodePudding user response:

Try this:

>>> df.assign(x=df['amount'].eq(df['tAmount']).cumsum()).replace(0, np.nan).dropna().drop_duplicates(subset=['doc_nr', 'x']).drop('x', axis=1)
  doc_nr amount invoice   toID tID tAmount        DATE
1     a1  -9.17      i1  toid1  t2   -9.17  2021-01-19
2     a2   9.17      i1  toid1  t1    9.17  2021-01-18
>>> 

CodePudding user response:

As you mentioned, you want amount to match tAmount:

It should have different tID; doc_nr and amount match tAmount.

then, you should include also these 2 fields in the second merge statement, as follows:

(Accounting.merge(Helper)
           .merge(Provider, left_on=['toID', 'amount'], right_on=['toID', 'tAmount'])
)

use the left_on= and right_on= parameters to include the related 2 merge fields.

Result:

  doc_nr  amount invoice   toID tID        DATE  tAmount
0     a1   -9.17      i1  toid1  t2  2021-01-19    -9.17
1     a2    9.17      i1  toid1  t1  2021-01-18     9.17
  • Related