Drop DataFrame Row Based on Existing Condition-CodePudding

Given the following pandas df -

Holding Account	Account Type	Column A	Column B
Rupert 06 (23938996)	Holding Account	1825973	1702598
Rupert 07 (23938996)	Holding Account	1697870	1825973
-	-	-	-
Caroline 06 (0131465)	Holding Account	11112222	5435450
Caroline 07 (0131465)	Holding Account	7896545	11112222

I had been trying to find a way to do the following -

Step 1 - For the entire df, search for instances of Column B values appearing in Column A
(example: Column B of Caroline 07 == Column A of Caroline 06)
Step 2 - Rows that meet the above criteria should have there Column B value changed to the Column B value of the row who they matches
(example: Caroline 7 Column B value will change from 11112222 to 5435450

This means the pandas df will now look as follows -

Holding Account	Account Type	Column A	Column B
Rupert 06 (23938996)	Holding Account	1825973	1702598
Rupert 07 (23938996)	Holding Account	1697870	1702598
-	-	-	-
Caroline 06 (0131465)	Holding Account	11112222	5435450
Caroline 07 (0131465)	Holding Account	7896545	5435450

Code to achieve this: the following code achieves Steps 1 & 2 -

import numpy as np
df['Column B'] = np.where(df['Column B'].isin(df['Column A'].values),df['Column B'].shift(),df['Column B'])

Where I need help: I would like to expand the code, to include the following:

Step 3 - Delete the row who's Column A value matched the other row's Column B, once Step 2 is complete
(example: Rupert 06 (23938996) and Caroline 06 (0131465) would be deleted, as they received Column B values from Rupert 07 (23938996) and Caroline 07 (0131465).

Holding Account	Account Type	Column A	Column B
Rupert 07 (23938996)	Holding Account	1697870	1702598
Caroline 07 (0131465)	Holding Account	7896545	5435450

Does anyone know how I can expand the code appropriately?

CodePudding user response：

Instead of using np.where, just compute some masks:

rows_to_remove = df['Column A'].isin(df['Column B'])
df.loc[df['Column B'].isin(df['Column A'].values), 'Column B'] = df.loc[rows_to_remove, 'Column B'].to_numpy()
df = df[~rows_to_remove]

Output:

>>> df
         Holding Account     Account Type  Column A  Column B
1   Rupert 07 (23938996)  Holding Account   1697870   1702598
3  Caroline 07 (0131465)  Holding Account   7896545   5435450