Home > OS >  Pandas - drop row if another row has the same values but the columns are switched
Pandas - drop row if another row has the same values but the columns are switched

Time:03-21

as you can see below some rows are the same except the value in column NUM1 and value in column NUM2 are switched. I need only one of them.

Dropping rows based on last three columns is not the right answer, because the table can contain other records with same dates and used column value, but with different NUM1 and NUM2.

    NUM1    NUM2    DATE1         DATE2          USED
0   4       7       10. 9. 2002   16. 11. 2002   5,3
1   7       4       10. 9. 2002   16. 11. 2002   5,3
2   9       10      10. 9. 2002   16. 11. 2002   3,1
3   10      9       10. 9. 2002   16. 11. 2002   3,1

New table should look like this:

    NUM1    NUM2    DATE1         DATE2          USED
0   4       7       10. 9. 2002   16. 11. 2002   5,3
1   9       10      10. 9. 2002   16. 11. 2002   3,1

Thanks for help!

CodePudding user response:

Try with frozenset

out = df[~df.apply(frozenset,1).duplicated()]
Out[906]: 
   NUM1  NUM2      DATE1       DATE2 USED
0     4     7  10.9.2002  16.11.2002  5,3
2     9    10  10.9.2002  16.11.2002  3,1

CodePudding user response:

Maybe you can take a copy of the columns and sort them, and then check for duplicates:

x = df[['NUM1', 'NUM2']].copy()
x[:] = np.sort(x, axis=1)
filtered_df = df[~x.duplicated()]

Output:

>>> filtered_df
   NUM1  NUM2      DATE1       DATE2 USED
0     4     7  10.9.2002  16.11.2002  5,3
2     9    10  10.9.2002  16.11.2002  3,1
  • Related