i am having a dataframe as follows:
df
cylce | csec | dist | vel |
---|---|---|---|
1 | -40.1 | 9.87 | 2.7 |
1 | -40.1 | 9.89 | 2.2 |
2 | -39.1 | 14.07 | 2.0 |
2 | -39.1 | 14.09 | 2.8 |
3 | -38.7 | 18.09 | 3.2 |
4 | -36.6 | 15.37 | 0.5 |
4 | -38.01 | 16.23 | 1.8 |
4 | -38.4 | 16.66 | 3.1 |
i have to drop the duplicate cycle based on some conditions:
-if the csec is same then
-look for the dist and keep the row with highest dist
-if dist are same check the vel ,keep the row with highest vel
-if csec is different
-keep the row with highest csec
output
cylce | csec | dist | vel |
---|---|---|---|
1 | -40.1 | 9.89 | 2.2 |
2 | -39.1 | 14.09 | 2.8 |
3 | -38.7 | 18.09 | 3.2 |
4 | -36.6 | 15.37 | 0.5 |
I was able to get the dupicate rows with following code
duplicate_cyle = df[df.duplicated('cycle',keep = False)]
I would like to know how to drop the rows based on the conditions.
CodePudding user response:
Sort your dataframe in descending order of csec, dist and vel, then drop duplictes, eg:
out = (
df.sort_values(['cycle', 'csec', 'dist', 'vel'], ascending=[True, False, False, False])
.drop_duplicates(subset=['cycle'])
)