How do I drop values of a specific condition and replace those?-CodePudding

My dataframe looks like this:

ID	first	second	fourth	fifth
1	one	one	Two	Three
2	one	Two	Two	Three
3	one	Three	Three	Three
4	one	one	one	one
5	one	one	two	one

Code:

df = {'ID': [1, 2, 3, 4, 5],
        'first': ['one', 'one', 'one', 'one', 'one']
'second': ['one', 'two', 'three','one','one']
'fourth': ['two', 'two', 'three','one','two']
'fifth': ['three','three','three','one', 'one']
        }

I want to drop/delete those values in one row that appear in the next column (right) as well. So there are a lot of duplicates but if there is another value between one same value like in "ID" 5, then just the value of the second column should be deleted, so that the df looks like this in the end:

ID	first	second	fourth	fifth
1	one	Two	Three	NaN
2	one	Two	Three	NaN
3	one	Three	NaN	NaN
4	one	NaN	NaN	NaN
5	one	two	one	NaN

CodePudding user response：

You can use drop_duplicates per row and reindex:

out = (df
 .set_index('ID')
 .apply(lambda s: (s2:=s.drop_duplicates())
                  .set_axis(s.index[:len(s2)]),
        axis=1)
 .reset_index().reindex(df.columns, axis=1)
)

output:

   ID first second fourth  fifth
0   1   one    Two  Three    NaN
1   2   one    Two  Three    NaN
2   3   one  Three    NaN    NaN
3   4   one    NaN    NaN    NaN
4   5   one    two    NaN    NaN

CodePudding user response：

You can just do shift then use the NaN replace the same

out = df.where(lambda x : df.ne(df.shift(1,axis=1))).transform(lambda x: sorted(x, key=pd.isnull),1)
Out[73]: 
  ID first second fourth fifth
0  1   one    Two  Three   NaN
1  2   one    Two  Three   NaN
2  3   one  Three    NaN   NaN
3  4   one    NaN    NaN   NaN
4  5   one    two    one   NaN