as you can see below some rows are the same except the value in column NUM1 and value in column NUM2 are switched. I need only one of them.
Dropping rows based on last three columns is not the right answer, because the table can contain other records with same dates and used column value, but with different NUM1 and NUM2.
NUM1 NUM2 DATE1 DATE2 USED
0 4 7 10. 9. 2002 16. 11. 2002 5,3
1 7 4 10. 9. 2002 16. 11. 2002 5,3
2 9 10 10. 9. 2002 16. 11. 2002 3,1
3 10 9 10. 9. 2002 16. 11. 2002 3,1
New table should look like this:
NUM1 NUM2 DATE1 DATE2 USED
0 4 7 10. 9. 2002 16. 11. 2002 5,3
1 9 10 10. 9. 2002 16. 11. 2002 3,1
Thanks for help!
CodePudding user response:
Try with frozenset
out = df[~df.apply(frozenset,1).duplicated()]
Out[906]:
NUM1 NUM2 DATE1 DATE2 USED
0 4 7 10.9.2002 16.11.2002 5,3
2 9 10 10.9.2002 16.11.2002 3,1
CodePudding user response:
Maybe you can take a copy of the columns and sort them, and then check for duplicates:
x = df[['NUM1', 'NUM2']].copy()
x[:] = np.sort(x, axis=1)
filtered_df = df[~x.duplicated()]
Output:
>>> filtered_df
NUM1 NUM2 DATE1 DATE2 USED
0 4 7 10.9.2002 16.11.2002 5,3
2 9 10 10.9.2002 16.11.2002 3,1