Home > Blockchain >  DataFrame - Add a new ranking column
DataFrame - Add a new ranking column

Time:10-01

For the table below, consider just the first 2 columns 'Fruit' and 'percentage'.

How do you add a third column, ie. 'new_column' that looks at the 'Fruit' column in groups and puts a number corrosponding to the percentage. For example, in the 'Apple' group - the highest percentage is 99 - so it is assigned 1....and so on.

So - given the 'Fruit' and 'percentage' column - how do you then add 'new_column' to the dataframe.

Hope this is clear and thankyou in advance.

Fruit percentage new_column
Apple 23 3
Apple 99 1
Apple 50 2
Pear 45 4
Pear 87 1
Pear 67 3
Pear 70 2
Peach 93 1
Peach 75 2

CodePudding user response:

I think this should be like this:

import pandas as pd

Original data:

df = pd.DataFrame({
      'fruit': ['Apple', 'Apple', 'Apple', 'Pear', 'Pear', 'Pear', 'Pear', 'Peach', 'Peach'], 
      'percentage': [23, 99, 50, 45, 87, 67, 70, 93, 75]
})

Output

enter image description here

Create new 'rank' column based on grouping the df dataframe on fruit and rank the value of percentage within the group.

df['rank'] = df.groupby('fruit')['percentage'].rank()

Output:

enter image description here

  • Related