Given the following pandas df
-
Holding Account | Account Type | Column A | Column B |
---|---|---|---|
Rupert 06 (23938996) | Holding Account | 1825973 | 1702598 |
Rupert 07 (23938996) | Holding Account | 1697870 | 1825973 |
- | - | - | - |
Caroline 06 (0131465) | Holding Account | 11112222 | 5435450 |
Caroline 07 (0131465) | Holding Account | 7896545 | 11112222 |
I had been trying to find a way to do the following -
- Step 1 - For the entire
df
, search for instances of Column B values appearing in Column A
(example: Column B ofCaroline 07
== Column A ofCaroline 06
) - Step 2 - Rows that meet the above criteria should have there Column B value changed to the Column B value of the row who they matches
(example:Caroline 7
Column B value will change from11112222
to5435450
This means the pandas df
will now look as follows -
Holding Account | Account Type | Column A | Column B |
---|---|---|---|
Rupert 06 (23938996) | Holding Account | 1825973 | 1702598 |
Rupert 07 (23938996) | Holding Account | 1697870 | 1702598 |
- | - | - | - |
Caroline 06 (0131465) | Holding Account | 11112222 | 5435450 |
Caroline 07 (0131465) | Holding Account | 7896545 | 5435450 |
Code to achieve this: the following code achieves Steps 1 & 2 -
import numpy as np
df['Column B'] = np.where(df['Column B'].isin(df['Column A'].values),df['Column B'].shift(),df['Column B'])
Where I need help: I would like to expand the code, to include the following:
- Step 3 - Delete the row who's Column A value matched the other row's Column B, once Step 2 is complete
(example:Rupert 06 (23938996)
andCaroline 06 (0131465)
would be deleted, as they received Column B values fromRupert 07 (23938996)
andCaroline 07 (0131465)
.
Holding Account | Account Type | Column A | Column B |
---|---|---|---|
Rupert 07 (23938996) | Holding Account | 1697870 | 1702598 |
Caroline 07 (0131465) | Holding Account | 7896545 | 5435450 |
Does anyone know how I can expand the code appropriately?
CodePudding user response:
Instead of using np.where
, just compute some masks:
rows_to_remove = df['Column A'].isin(df['Column B'])
df.loc[df['Column B'].isin(df['Column A'].values), 'Column B'] = df.loc[rows_to_remove, 'Column B'].to_numpy()
df = df[~rows_to_remove]
Output:
>>> df
Holding Account Account Type Column A Column B
1 Rupert 07 (23938996) Holding Account 1697870 1702598
3 Caroline 07 (0131465) Holding Account 7896545 5435450