So I have a very large dataframe with over 500 rows and 100 columns involved, each row representing a person and column representing a performance metric:
[Person] [Metric 1] [Metric 2] ...
A [num] [num]
B [num] [num]
C [num] [num]
D [num] [num]
...
What I want is as follows:
[Person] [Metric 1] [Ranking for 1][Metric 2][Ranking for 2] ...
A [num] [Rank] [num] [Rank]
B [num] [Rank] [num] [Rank]
C [num] [Rank] [num] [Rank]
D [num] [Rank] [num] [Rank]
...
Now, I can use pandas.rank to do this for one column, but it seems really clunky when there are over 100 involved, not to mention having to move the columns around afterwards. Is there a quick way (maybe involving loops) to rank all 100 columns at once?
CodePudding user response:
You can use pd.DataFrame.rank
:
out = (df
.join(df.filter(like='Metric')
.rank()
.add_prefix('rank_')
)
.sort_index(axis=1, key=lambda x: x.str.extract(r'Metric(\d )', expand=False).fillna(0).astype(int))
)
NB. There are many ranking algorithms, check the documentation for the one you need.
Dummy example:
Person Metric1 rank_Metric1 Metric2 rank_Metric2
0 A 1 2.0 1 2.0
1 B 3 4.0 0 1.0
2 C 0 1.0 2 3.0
3 D 2 3.0 3 4.0
CodePudding user response:
Given this dataframe
metric1 metric2 metric3
A 1 2 3
B 3 4 5
C 2 1 7
You can create each rank and a new column name
new_column_order = []
for index, column in enumerate(df.columns):
df[f"rank {index 1}"] = df[column].rank()
new_column_order.append(column)
new_column_order.append(f"rank {index 1}")
df.columns = new_column_order
at the end of the for loop the dataframe is
metric1 metric2 metric3 rank 1 rank 2 rank 3
A 1 2 3 1.0 2.0 1.0
B 3 4 5 3.0 3.0 2.0
C 2 1 7 2.0 1.0 3.0
and the list of columns reorganized (new_column_order)
['metric1', 'rank 1', 'metric2', 'rank 2', 'metric3', 'rank 3']