Home > OS >  Remove related row from pandas dataframe
Remove related row from pandas dataframe

Time:04-07

I have the following dataframe:

id relatedId coordinate
123 125 55
125 123 45
128 130 60
132 135 50
130 128 40
135 132 50

So I have 6 rows in this dataframe, but I would like to get rid of the related rows resulting in 3 rows. The coordinate column equals 100 between the two related rows, and I would like to keep the one with the lowest value (so the one less than 50. If both are 50, simply one of them). The resulting dataframe would thus be:

id relatedId coordinate
125 123 45
132 135 50
130 128 40

Hopefully someone has a good solution for this problem. Thanks

CodePudding user response:

You can sort the values and get the first value per group using a frozenset of the 2 ids as grouper:

(df
 .sort_values(by='coordinate')
 .groupby(df[['id', 'relatedId']].agg(frozenset, axis=1), as_index=False)
 .first()
)

output:

    id  relatedId  coordinate
0  130        128          40
1  125        123          45
2  132        135          50

Alternatively, to keep the original order, and original indices, use idxmin per group:

group = df[['id', 'relatedId']].agg(frozenset, axis=1)
idx = df['coordinate'].groupby(group).idxmin()
df.loc[sorted(idx)]

output:

    id  relatedId  coordinate
1  125        123          45
3  132        135          50
4  130        128          40
  • Related