Home > database >  How to drop duplicates for the following data-frame based on multiple conditions?
How to drop duplicates for the following data-frame based on multiple conditions?

Time:06-09

i am having a dataframe as follows:

df

cylce csec dist vel
1 -40.1 9.87 2.7
1 -40.1 9.89 2.2
2 -39.1 14.07 2.0
2 -39.1 14.09 2.8
3 -38.7 18.09 3.2
4 -36.6 15.37 0.5
4 -38.01 16.23 1.8
4 -38.4 16.66 3.1
i have to drop the duplicate cycle based on some conditions:  
    -if the csec is same then  
            -look for the  dist and keep the row with highest dist  
                   -if dist are same check the vel ,keep the row with highest vel  
    -if csec is different
            -keep the row with highest csec

output

cylce csec dist vel
1 -40.1 9.89 2.2
2 -39.1 14.09 2.8
3 -38.7 18.09 3.2
4 -36.6 15.37 0.5

I was able to get the dupicate rows with following code

    duplicate_cyle = df[df.duplicated('cycle',keep = False)]

I would like to know how to drop the rows based on the conditions.

CodePudding user response:

Sort your dataframe in descending order of csec, dist and vel, then drop duplictes, eg:

out = (
    df.sort_values(['cycle', 'csec', 'dist', 'vel'], ascending=[True, False, False, False])
    .drop_duplicates(subset=['cycle'])
)
  • Related