My dataframe looks like this:
ID | first | second | fourth | fifth |
---|---|---|---|---|
1 | one | one | Two | Three |
2 | one | Two | Two | Three |
3 | one | Three | Three | Three |
4 | one | one | one | one |
5 | one | one | two | one |
Code:
df = {'ID': [1, 2, 3, 4, 5],
'first': ['one', 'one', 'one', 'one', 'one']
'second': ['one', 'two', 'three','one','one']
'fourth': ['two', 'two', 'three','one','two']
'fifth': ['three','three','three','one', 'one']
}
I want to drop/delete those values in one row that appear in the next column (right) as well. So there are a lot of duplicates but if there is another value between one same value like in "ID" 5, then just the value of the second column should be deleted, so that the df looks like this in the end:
ID | first | second | fourth | fifth |
---|---|---|---|---|
1 | one | Two | Three | NaN |
2 | one | Two | Three | NaN |
3 | one | Three | NaN | NaN |
4 | one | NaN | NaN | NaN |
5 | one | two | one | NaN |
CodePudding user response:
You can use drop_duplicates
per row and reindex:
out = (df
.set_index('ID')
.apply(lambda s: (s2:=s.drop_duplicates())
.set_axis(s.index[:len(s2)]),
axis=1)
.reset_index().reindex(df.columns, axis=1)
)
output:
ID first second fourth fifth
0 1 one Two Three NaN
1 2 one Two Three NaN
2 3 one Three NaN NaN
3 4 one NaN NaN NaN
4 5 one two NaN NaN
CodePudding user response:
You can just do shift
then use the NaN replace the same
out = df.where(lambda x : df.ne(df.shift(1,axis=1))).transform(lambda x: sorted(x, key=pd.isnull),1)
Out[73]:
ID first second fourth fifth
0 1 one Two Three NaN
1 2 one Two Three NaN
2 3 one Three NaN NaN
3 4 one NaN NaN NaN
4 5 one two one NaN