I have the following dataframe:
id | relatedId | coordinate |
---|---|---|
123 | 125 | 55 |
125 | 123 | 45 |
128 | 130 | 60 |
132 | 135 | 50 |
130 | 128 | 40 |
135 | 132 | 50 |
So I have 6 rows in this dataframe, but I would like to get rid of the related rows resulting in 3 rows. The coordinate column equals 100 between the two related rows, and I would like to keep the one with the lowest value (so the one less than 50. If both are 50, simply one of them). The resulting dataframe would thus be:
id | relatedId | coordinate |
---|---|---|
125 | 123 | 45 |
132 | 135 | 50 |
130 | 128 | 40 |
Hopefully someone has a good solution for this problem. Thanks
CodePudding user response:
You can sort the values and get the first value per group using a frozenset
of the 2 ids as grouper:
(df
.sort_values(by='coordinate')
.groupby(df[['id', 'relatedId']].agg(frozenset, axis=1), as_index=False)
.first()
)
output:
id relatedId coordinate
0 130 128 40
1 125 123 45
2 132 135 50
Alternatively, to keep the original order, and original indices, use idxmin
per group:
group = df[['id', 'relatedId']].agg(frozenset, axis=1)
idx = df['coordinate'].groupby(group).idxmin()
df.loc[sorted(idx)]
output:
id relatedId coordinate
1 125 123 45
3 132 135 50
4 130 128 40