I have the following dataframe df
:
topic num
0 a01 1
1 a01 1
2 a01 2
3 a02 1
4 a02 3
5 a02 2
6 a02 3
7 a03 2
8 a03 1
And I need to create a new dataframe newdf
, where each row corresponds to the topic and the maximum number for each topic, like the following:
topic num
0 a01 2
1 a02 3
2 a03 2
I've tried to use the max() function from pandas, but to no avail. What I don't seem to get is how I'm gonna iterate through each row and find the highest value correspondent to the topic. How do I separate a01 from a02, so that I can get the maximum value for each? I've also tried transposing, but the same doubt keeps appearing.
CodePudding user response:
See Get the row(s) which have the max value in groups using groupby
Example:
new_df = df.groupby(['topic'], sort=False)['num'].max()
CodePudding user response:
You can use GroupBy.max
with numeric_only=True
:
newdf= df.groupby("topic", as_index=False).max(numeric_only=True)
Output:
print(newdf)
topic num
0 a01 2
1 a02 3
2 a03 2