Home > Software engineering >  How to remove duplicate rows by keeping one mean column in pandas dataframe?
How to remove duplicate rows by keeping one mean column in pandas dataframe?

Time:09-02

I have a dataset with 3 columns.

   A   B    C  
0  1   11   2.1
1  1   11   1.4
2  2   7    2.4
3  2   12   1.8
4  3   10   2.6
5  3   10   2.2

We can see indexes 0 and 1 have the same value for columns A and B, but Column C has a different value. Indexes 4 and 5 are the same case. Now I want to remove the duplicates and keep one row instead of more than one row with the mean value of column C.

only drop_duplicates functions enter code here cannot perform this.

df2 = df.drop_duplicates(subset=["A", "B"], inplace=True)
df2

The required new data frame should present as follow:

   A   B    C  
0  1   11   1.75
2  2   7    2.4
3  2   12   1.8
4  3   10   2.4

How can I do this in pandas?

CodePudding user response:

This works. Just adding to Quang Hoang's comment.

import pandas as pd
df = pd.DataFrame({'A':[1,1,2,2,3,3],
                   'B':[11,11,7,12,10,10],
                   'C':[2.1,1.4,2.4,1.8,2.6,2.2]})

df2 = df.groupby(['A','B']).mean().reset_index()
  • Related