Remove duplicates when values are swapped in columns and give a count-CodePudding

I have a DataFrame like this:

Col1	Col2	Score
A	B	0.6
A	B	0.6
B	A	0.6
A	C	0.8
C	A	0.8
D	E	0.9

I want to remove all duplicates even if the values are swapped in col1 and col2 and count such occurrences.

I know we can remove it with

df.drop_duplicates()

For cases when values are swapped but the meaning is same, then I am creating a temp column which is [col1,col2] and re-arranging the list in ascending order to delete it again using the duplicates method above.

I need to count all these duplicates too. Is there a better way to do this? The resultant DataFrame should be like this:

Col1	Col2	Duplicates	Score
A	B	3	0.6
A	C	2	0.8
D	E	1	0.9

CodePudding user response：

IIUC, you could use a frozenset as grouper:

group = df[['Col1', 'Col2']].agg(frozenset, axis=1)

(df
 .groupby(group, as_index=False)  # you can also group by [group, 'Score']
 .agg(**{c: (c, 'first') for c in df},
      Duplicates=('Score', 'count'),
     )
)

output:

  Col1 Col2  Score  Duplicates
0    A    B    0.6           3
1    A    C    0.8           2
2    D    E    0.9           1

CodePudding user response：

Here is an alternative way using np.sort

df[['Col1','Col2']] = np.sort(df[['Col1','Col2']].to_numpy(),axis=1)

(df.groupby(['Col1','Col2']).agg(
    Count = ('Score','count'),
    Score = ('Score','first'))
 .reset_index())

Output:

  Col1 Col2  Count  Score
0    A    B      3    0.6
1    A    C      2    0.8
2    D    E      1    0.9