I have a dataset with 3 columns.
A B C
0 1 11 2.1
1 1 11 1.4
2 2 7 2.4
3 2 12 1.8
4 3 10 2.6
5 3 10 2.2
We can see indexes 0 and 1 have the same value for columns A and B, but Column C has a different value. Indexes 4 and 5 are the same case. Now I want to remove the duplicates and keep one row instead of more than one row with the mean value of column C.
only drop_duplicates functions enter code here cannot perform this.
df2 = df.drop_duplicates(subset=["A", "B"], inplace=True)
df2
The required new data frame should present as follow:
A B C
0 1 11 1.75
2 2 7 2.4
3 2 12 1.8
4 3 10 2.4
How can I do this in pandas?
CodePudding user response:
This works. Just adding to Quang Hoang's comment.
import pandas as pd
df = pd.DataFrame({'A':[1,1,2,2,3,3],
'B':[11,11,7,12,10,10],
'C':[2.1,1.4,2.4,1.8,2.6,2.2]})
df2 = df.groupby(['A','B']).mean().reset_index()