Home > Software engineering >  Drop DataFrame Row Based on Existing Condition
Drop DataFrame Row Based on Existing Condition

Time:03-23

Given the following pandas df -

Holding Account Account Type Column A Column B
Rupert 06 (23938996) Holding Account 1825973 1702598
Rupert 07 (23938996) Holding Account 1697870 1825973
- - - -
Caroline 06 (0131465) Holding Account 11112222 5435450
Caroline 07 (0131465) Holding Account 7896545 11112222

I had been trying to find a way to do the following -

  • Step 1 - For the entire df, search for instances of Column B values appearing in Column A
    (example: Column B of Caroline 07 == Column A of Caroline 06)
  • Step 2 - Rows that meet the above criteria should have there Column B value changed to the Column B value of the row who they matches
    (example: Caroline 7 Column B value will change from 11112222 to 5435450

This means the pandas df will now look as follows -

Holding Account Account Type Column A Column B
Rupert 06 (23938996) Holding Account 1825973 1702598
Rupert 07 (23938996) Holding Account 1697870 1702598
- - - -
Caroline 06 (0131465) Holding Account 11112222 5435450
Caroline 07 (0131465) Holding Account 7896545 5435450

Code to achieve this: the following code achieves Steps 1 & 2 -

import numpy as np
df['Column B'] = np.where(df['Column B'].isin(df['Column A'].values),df['Column B'].shift(),df['Column B'])

Where I need help: I would like to expand the code, to include the following:

  • Step 3 - Delete the row who's Column A value matched the other row's Column B, once Step 2 is complete
    (example: Rupert 06 (23938996) and Caroline 06 (0131465) would be deleted, as they received Column B values from Rupert 07 (23938996) and Caroline 07 (0131465).
Holding Account Account Type Column A Column B
Rupert 07 (23938996) Holding Account 1697870 1702598
Caroline 07 (0131465) Holding Account 7896545 5435450

Does anyone know how I can expand the code appropriately?

CodePudding user response:

Instead of using np.where, just compute some masks:

rows_to_remove = df['Column A'].isin(df['Column B'])
df.loc[df['Column B'].isin(df['Column A'].values), 'Column B'] = df.loc[rows_to_remove, 'Column B'].to_numpy()
df = df[~rows_to_remove]

Output:

>>> df
         Holding Account     Account Type  Column A  Column B
1   Rupert 07 (23938996)  Holding Account   1697870   1702598
3  Caroline 07 (0131465)  Holding Account   7896545   5435450
  • Related