Home > Software design >  How to get rank of a column by filtering by two columns?
How to get rank of a column by filtering by two columns?

Time:02-14

I have the data frame below. I want to get a create a new column called rank that ranks the stock by growth for each year and sector.

import pandas as pd
d = {'date': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004],  'Stock': ["GM", "F", "GM", "F", "GM", "F", "GM", "F", "GM", "F"], 'US Sales Growth': [.3, .3, .1, .1, .6, .6, .12, .12, .7, .7], 'Stock_Growth': [.1, .2, .3, .4, .14, .16,  .2, .1,.15,.16]}
df = pd.DataFrame(data=d)

enter image description here

It should look like this.

d = {'date': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004],  'Stock': ["GM", "F", "GM", "F", "GM", "F", "GM", "F", "GM", "F"], 'Sector': ['Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto'],  'US Sales Growth': [.3, .3, .1, .1, .6, .6, .12, .12, .7, .7], 'Stock_Growth': [.1, .2, .3, .4, .14, .16,  .2, .1,.15,.16], 'Rank': [2, 1, 2, 1, 2, 1, 1, 2, 2, 1]}

df2 = pd.DataFrame(data=d) 

enter image description here

I tried doing this but just gave me NANs

df['place_holder'] = df.groupby(['Stock','date'], as_index=False)['Stock_Growth'].transform(lambda x: x.shift(1))
df['rank']  = df.groupby('date', as_index=False)['place_holder'].transform(lambda x: x.rank())

Another way of doing it was get groupby date first then sector and then rank them.

CodePudding user response:

Using groupby and then apply another step to rank each group seems the right approach. I recommend using the rank method of a Groupby object to compute the rank of the values of a column in each group. This means, that this should provide the expected output:

>> df["Rank"] = df.groupby(["date", "Sector"])["Stock_Growth"].rank(ascending=False)
>> df
   date Stock  US Sales Growth  Stock_Growth Sector  Rank
0  2000    GM             0.30          0.10   Auto   2.0
1  2000     F             0.30          0.20   Auto   1.0
2  2001    GM             0.10          0.30   Auto   2.0
3  2001     F             0.10          0.40   Auto   1.0
4  2002    GM             0.60          0.14   Auto   2.0
5  2002     F             0.60          0.16   Auto   1.0
6  2003    GM             0.12          0.20   Auto   1.0
7  2003     F             0.12          0.10   Auto   2.0
8  2004    GM             0.70          0.15   Auto   2.0
9  2004     F             0.70          0.16   Auto   1.0

The ascending=False part ensures that higher values get a better (lower) rank.

  • Related