Home > Blockchain >  pandas remove the duplicated row base on same columns values
pandas remove the duplicated row base on same columns values

Time:09-14

I have a df like this:

    Date    Model   High     Low    Final
1   9132022 model6  4.36000  2.39   3.10
2   9132022 model4  10.92000 2.87   8.32
3   9132022 model6  4.36000  2.39   3.73
4   9132022 model6  4.36000  2.39   3.10
5   9132022 model6  4.36000  2.39   2.47

6   9142022 model6  41.3600 21.39   31.10
7   9142022 model4  110.920 21.87   81.32
8   9142022 model6  41.3600 21.39   31.73
9   9142022 model6  41.3600 21.39   31.10
10  9142022 model6  41.3600 21.39   21.47

If the Date and Model are the same,just keep the first record,the output should be:

        Date    Model   High     Low    Final
    1   9132022 model6  4.36000  2.39   3.10
    2   9132022 model4  10.92000 2.87   8.32
 

    3   9142022 model6  41.3600 21.39   31.10
    4   9142022 model4  110.920 21.87   81.32
   

CodePudding user response:

If the name of the variable for the DataFrame is df then:

df.groupby(['Date', 'Model']).head(1)

CodePudding user response:

Okay so first we need to recreate OP's dataframe:

df = pd.DataFrame({"Date": [9132022, 9132022, 9132022, 9132022, 9132022, 9142022, 9142022, 9142022, 9142022, 9142022],
                   "Model": ["model6", "model4", "model6", "model6", "model6", "model6", "model4", "model6", "model6", "model6"],
                   "High": [4.36000,10.92000,4.36000,4.36000,4.36000,41.3600,110.920,41.3600,41.3600,41.3600],
                   "Low": [2.39,2.87,2.39,2.39,2.39,21.39,21.87,21.39,21.39,21.39],
                   "Final":[3.10,8.32,3.73,3.10,2.47,31.10,81.32,31.73,31.10,21.47]
                   })

Then what you need to do is group by Date and Model columns, and then return the first occurence of everything by using the first aggregate function:

df.groupby(["Date","Model"],as_index=False).first()

outputs:

0   9132022 model4  10.92   2.87    8.32
1   9132022 model6  4.36    2.39    3.10
2   9142022 model4  110.92  21.87   81.32
3   9142022 model6  41.36   21.39   31.10

This messes up the index a little, but if you want to keep the original index you can df = df.reset_index() before the grouping.

For future reference please consider providing the original dataframe (in code) so that people that want to look into it can recreate it easily, without having to manually copy & paste values.

If this solved your problem please mark the answer as solution. :)

  • Related