Home > Mobile >  Getting rid of combinations of IDs already present in the data
Getting rid of combinations of IDs already present in the data

Time:10-09

I have the following dataset

idA idB value
1   5   0.11
2   6   0.25
3   7   0.3
4   8   0.4
.   .   .
.   .   .
.   .   .
.   .   .
.   .   .
5   1   0.11
6   2   0.25
7   3   0.3
8   4   0.4

idA and idB are IDs for the same dataset (basically both idA and idB come from the same column). So, if I have (idA = 1, idB = 5) it's the same as (idB = 1, idA = 5). I want to get rid of the multiple cases (the one at the botom in my df), in order to obtain

idA idB value
1   5   0.11
2   6   0.25
3   7   0.3
4   8   0.4
.   .   .
.   .   .
.   .   .
.   .   .
.   .   .

Any idea on how to do that ?

Thanks i advance.

CodePudding user response:

Apply sorted along rows (axis=1) to group idA and idB and use that to find the duplicate rows.

df["id"] = df[["idA", "idB"]].apply(lambda x: tuple(sorted(x)), axis=1)
df[~df.duplicated("id")]
#    idA  idB  value      id
# 0    1    5   0.11  (1, 5)
# 1    2    6   0.25  (2, 6)
# 2    3    7   0.30  (3, 7)
# 3    4    8   0.40  (4, 8)

CodePudding user response:

Group by value, and recover max(idA), min(idB), or min max

  • Related