Home > Net >  Create new dataframe from the highest values in a column
Create new dataframe from the highest values in a column

Time:11-30

I have the following dataframe df:

    topic   num
0   a01     1
1   a01     1
2   a01     2
3   a02     1
4   a02     3
5   a02     2
6   a02     3
7   a03     2
8   a03     1

And I need to create a new dataframe newdf, where each row corresponds to the topic and the maximum number for each topic, like the following:

    topic   num
0   a01     2
1   a02     3
2   a03     2

I've tried to use the max() function from pandas, but to no avail. What I don't seem to get is how I'm gonna iterate through each row and find the highest value correspondent to the topic. How do I separate a01 from a02, so that I can get the maximum value for each? I've also tried transposing, but the same doubt keeps appearing.

CodePudding user response:

See Get the row(s) which have the max value in groups using groupby

Example:

new_df = df.groupby(['topic'], sort=False)['num'].max()

CodePudding user response:

You can use GroupBy.max with numeric_only=True:

newdf= df.groupby("topic", as_index=False).max(numeric_only=True)

Output:

print(newdf)

  topic  num
0   a01    2
1   a02    3
2   a03    2
  • Related