In the data frame, there is some repeat value in given specific columns So, I want to remove all repeat values
For example,
In col3 if anything available in col2 and col1 then it remove from col3. Same now for col2, if any value from col2 is available in col1 then remove from col2
col1 col2 col3 col4 col5
1 12 1 6
2 1 9 1 0
3 1 2
4 9 11 3 10
5 6 4
6 1 3 5
Output
col1 col2 col3 col4 col5
1 12 1 6
2 1 0
3 2
4 9 11 3 10
5 4
6 5
col4 and col5 remain same
CodePudding user response:
col1= [1,2,3,4,5,6]
col2= [12,1,None,9,6,1]
col3= [None,9,1,11,None,3]
col4= [1,1,2,3,4,5]
col5= [6,0,None,10,None,None]
import pandas as pd
# make df from col1 to col5
df = pd.DataFrame({'col1':col1, 'col2':col2, 'col3':col3, 'col4':col4, 'col5':col5})
# iterate over columns
for col in df.columns:
# remove values that are in previous columns
for prev_col in df.columns[:df.columns.get_loc(col)]:
df[col] = df[col].where(~df[col].isin(df[prev_col]), None)
# OUTPUT
# col1 col2 col3 col4 col5
# 0 1 12.0 NaN NaN NaN
# 1 2 NaN NaN NaN 0.0
# 2 3 NaN NaN NaN NaN
# 3 4 9.0 11.0 NaN 10.0
# 4 5 NaN NaN NaN NaN
# 5 6 NaN NaN NaN NaN
CodePudding user response:
Using zip
and the reversed column list in a loop:
cols = list(df)[::-1]
for c, prev_c in zip(cols, cols[1:]):
df.loc[df[c].isin(df[prev_c]), c] = float('nan')
output:
col1 col2 col3 col4 col5
0 1 12.0 NaN NaN 6.0
1 2 NaN NaN NaN 0.0
2 3 NaN NaN 2.0 NaN
3 4 9.0 11.0 NaN 10.0
4 5 NaN NaN 4.0 NaN
5 6 NaN 3.0 5.0 NaN