In a data, frame match two-column and if any value from the second column is available in the first column, remove value from the second columns
col1 col2
1
2 1
3 9
4
5 1
6 2
Output
col1 col2
1
2
3 9
4
5
6
Here, 1 and 2 from col2 are available in col1. So, this repeated data should be removed
CodePudding user response:
Using s.mask
to value match and replace, we can do something along the likes of:
df['col2'] = df['col2'].mask(pd.to_numeric(df['col2']).isin(df['col1']), "")
col1 col2
0 1
1 2
2 3 9.0
3 4
4 5
5 6
CodePudding user response:
import pandas as pd
col1= [1,2,3,4,5,6]
col2= [0,0,9.0,0,0,0]
df = pd.DataFrame({'col1':col1, 'col2':col2})
# add column with no of occurrence of Non None values in the column name starts with 'a'
# iterate over columns
for col in df.columns:
# remove values that are in previous columns
for prev_col in df.columns[:df.columns.get_loc(col)]:
df[col] = df[col].where(~df[col].isin(df[prev_col]), None)
# OUTPUT
# col1 col2
# 0 1 0.0
# 1 2 0.0
# 2 3 9.0
# 3 4 0.0
# 4 5 0.0
# 5 6 0.0