In data frame, remove all repeat value from col2 and col3 by matching col1 id of row is not matter-CodePudding

In the data frame, there is some repeat value in given specific columns So, I want to remove all repeat values

For example,

In col3 if anything available in col2 and col1 then it remove from col3. Same now for col2, if any value from col2 is available in col1 then remove from col2

col1 col2 col3 col4 col5

1    12        1    6    
2    1    9    1    0
3         1    2
4    9    11   3    10 
5    6         4
6    1    3    5

Output

col1 col2 col3 col4 col5

1    12        1    6    
2              1    0
3              2
4    9    11   3    10 
5              4
6              5

col4 and col5 remain same

CodePudding user response：

col1= [1,2,3,4,5,6]
col2= [12,1,None,9,6,1]
col3= [None,9,1,11,None,3]
col4= [1,1,2,3,4,5]
col5= [6,0,None,10,None,None]

import pandas as pd
# make df from col1 to col5
df = pd.DataFrame({'col1':col1, 'col2':col2, 'col3':col3, 'col4':col4, 'col5':col5})

# iterate over columns
for col in df.columns:
    # remove values that are in previous columns
    for prev_col in df.columns[:df.columns.get_loc(col)]:
        df[col] = df[col].where(~df[col].isin(df[prev_col]), None)
# OUTPUT
#    col1  col2  col3  col4  col5
# 0     1  12.0   NaN   NaN   NaN
# 1     2   NaN   NaN   NaN   0.0
# 2     3   NaN   NaN   NaN   NaN
# 3     4   9.0  11.0   NaN  10.0
# 4     5   NaN   NaN   NaN   NaN
# 5     6   NaN   NaN   NaN   NaN

CodePudding user response：

Using zip and the reversed column list in a loop:

cols = list(df)[::-1]

for c, prev_c in zip(cols, cols[1:]):
    df.loc[df[c].isin(df[prev_c]), c] = float('nan')

output:

   col1  col2  col3  col4  col5
0     1  12.0   NaN   NaN   6.0
1     2   NaN   NaN   NaN   0.0
2     3   NaN   NaN   2.0   NaN
3     4   9.0  11.0   NaN  10.0
4     5   NaN   NaN   4.0   NaN
5     6   NaN   3.0   5.0   NaN