I have this dataframe that looks like this
data = {'col1': ['a', 'b', 'c'],
'col2': [10, 5, 20]}
df_sample = pd.DataFrame(data=data)
I want to calculate the rank of col2. I wrote this function
def rank_by(df):
if df.shape[0] >= 10:
df.sort_values(by=['col2'])
l = []
for val in df['col2']:
rank = (val/df['col2'].max())*10
l.append(rank)
df['rank'] = l
return df
Please assume col2
has more than 10 values. I want to know if there is a more pythonic way of applying the function defined above.
CodePudding user response:
Use pd.Series.rank
:
df_sample['rank'] = df_sample['col2'].rank()
Output:
col1 col2 rank
0 a 10 2.0
1 b 5 1.0
2 c 20 3.0
Note, there are different methods to handle ties.
CodePudding user response:
It looks like you just want the ratio to the max value multiplied by 10:
df_sample['rank'] = df_sample['col2'].div(df_sample['col2'].max()).mul(10)
print(df_sample.sort_values(by='col2'))
Output:
col1 col2 rank
4 e 2 0.8
8 i 2 0.8
3 d 4 1.6
1 b 5 2.0
6 g 6 2.4
9 j 9 3.6
0 a 10 4.0
7 h 12 4.8
2 c 20 8.0
5 f 25 10.0
Used input:
data = {'col1': list('abcdefghij'),
'col2': [10, 5, 20,4,2,25,6,12,2,9]}
df_sample = pd.DataFrame(data=data)