I have the following dataset
idA idB value
1 5 0.11
2 6 0.25
3 7 0.3
4 8 0.4
. . .
. . .
. . .
. . .
. . .
5 1 0.11
6 2 0.25
7 3 0.3
8 4 0.4
idA and idB are IDs for the same dataset (basically both idA and idB come from the same column). So, if I have (idA = 1, idB = 5) it's the same as (idB = 1, idA = 5). I want to get rid of the multiple cases (the one at the botom in my df), in order to obtain
idA idB value
1 5 0.11
2 6 0.25
3 7 0.3
4 8 0.4
. . .
. . .
. . .
. . .
. . .
Any idea on how to do that ?
Thanks i advance.
CodePudding user response:
Apply sorted
along rows (axis=1
) to group idA
and idB
and use that to find the duplicate rows.
df["id"] = df[["idA", "idB"]].apply(lambda x: tuple(sorted(x)), axis=1)
df[~df.duplicated("id")]
# idA idB value id
# 0 1 5 0.11 (1, 5)
# 1 2 6 0.25 (2, 6)
# 2 3 7 0.30 (3, 7)
# 3 4 8 0.40 (4, 8)
CodePudding user response:
Group by value, and recover max(idA), min(idB), or min max