DataFrame - Add a new ranking column-CodePudding

For the table below, consider just the first 2 columns 'Fruit' and 'percentage'.

How do you add a third column, ie. 'new_column' that looks at the 'Fruit' column in groups and puts a number corrosponding to the percentage. For example, in the 'Apple' group - the highest percentage is 99 - so it is assigned 1....and so on.

So - given the 'Fruit' and 'percentage' column - how do you then add 'new_column' to the dataframe.

Hope this is clear and thankyou in advance.

Fruit	percentage	new_column
Apple	23	3
Apple	99	1
Apple	50	2
Pear	45	4
Pear	87	1
Pear	67	3
Pear	70	2
Peach	93	1
Peach	75	2

CodePudding user response：

I think this should be like this:

import pandas as pd

Original data:

df = pd.DataFrame({
      'fruit': ['Apple', 'Apple', 'Apple', 'Pear', 'Pear', 'Pear', 'Pear', 'Peach', 'Peach'], 
      'percentage': [23, 99, 50, 45, 87, 67, 70, 93, 75]
})

Output

Create new 'rank' column based on grouping the df dataframe on fruit and rank the value of percentage within the group.

df['rank'] = df.groupby('fruit')['percentage'].rank()

Output: