Home > Blockchain >  Caculating rank for a particular column in pandas
Caculating rank for a particular column in pandas

Time:01-26

I have this dataframe that looks like this

data = {'col1': ['a', 'b', 'c'],
        'col2': [10, 5, 20]}

df_sample = pd.DataFrame(data=data)

I want to calculate the rank of col2. I wrote this function

def rank_by(df):
    if df.shape[0] >= 10:
        df.sort_values(by=['col2'])
        l = []
        for val in df['col2']:
            rank = (val/df['col2'].max())*10
            l.append(rank)
        df['rank'] = l
        return df

Please assume col2 has more than 10 values. I want to know if there is a more pythonic way of applying the function defined above.

CodePudding user response:

Use pd.Series.rank:

df_sample['rank'] = df_sample['col2'].rank()

Output:

  col1  col2  rank
0    a    10   2.0
1    b     5   1.0
2    c    20   3.0

Note, there are different methods to handle ties.

CodePudding user response:

It looks like you just want the ratio to the max value multiplied by 10:

df_sample['rank'] = df_sample['col2'].div(df_sample['col2'].max()).mul(10)

print(df_sample.sort_values(by='col2'))

Output:

  col1  col2  rank
4    e     2   0.8
8    i     2   0.8
3    d     4   1.6
1    b     5   2.0
6    g     6   2.4
9    j     9   3.6
0    a    10   4.0
7    h    12   4.8
2    c    20   8.0
5    f    25  10.0

Used input:

data = {'col1': list('abcdefghij'),
        'col2': [10, 5, 20,4,2,25,6,12,2,9]}
df_sample = pd.DataFrame(data=data)
  • Related