I would like to create a rank table based on a multi-column pandas dataframe, with several numerical columns.
Let's use the following df as an example:
Name | Sales | Volume | Reviews |
---|---|---|---|
A | 1000 | 100 | 100 |
B | 2000 | 200 | 50 |
C | 5400 | 500 | 10 |
I would like to create a new table, ranked_df that ranks the values in each column by descending order while maintaining essentially the same format:
Name | Sales_rank | Volume_rank | Reviews_rank |
---|---|---|---|
A | 3 | 3 | 1 |
B | 2 | 2 | 2 |
C | 1 | 1 | 3 |
Now, I can iteratively do this by looping through the columns, i.e.
df = pd.DataFrame{
"Name":['A', 'B', 'C'],
"Sales":[1000, 2000, 5400],
"Volume":[100, 200, 500],
"Reviews":[1000, 2000, 5400]
}
# make a copy of the original df
ranked_df = df.copy()
# define our interested columns
interest_cols = ['Sales', 'Volume', 'Reviews']
for col in interest_cols:
ranked_df[f"{col}_rank"] = df[col].rank()
# drop the cols not needed
...
But my question is this: is there a more elegant - or pythonic way of doing this? Maybe an apply for the dataframe? Or some vectorized operation by throwing it to numpy?
Thank you.
CodePudding user response:
You could use transform/apply to hit each column
df.set_index('Name').transform(pd.Series.rank, ascending = False)
Sales Volume Reviews
Name
A 3.0 3.0 1.0
B 2.0 2.0 2.0
C 1.0 1.0 3.0
CodePudding user response:
df.set_index('Name').rank().reset_index()
Name Sales Volume Reviews
0 A 1.0 1.0 1.0
1 B 2.0 2.0 2.0
2 C 3.0 3.0 3.0