Home > front end >  How to rank each item at the same location across different lists in python?
How to rank each item at the same location across different lists in python?

Time:03-26

I am R programmer, but need to achieve rankings in a table using python. Let's say I have a column "test" with a list of number lists:

df = pd.DataFrame({"test":[[1,4,7], [4,2,6], [3,8,1]]})

I expected to rank each item at the same location across rows (lists), and average all ranks to get a final score:

expected:

       test      rank_list    final_score
0   [1, 4, 7]    [1, 2, 3]       2
1   [4, 2, 6]    [3, 1, 2]       2
2   [3, 8, 1]    [2, 3, 1]       2

I know it is not a good example that all final scores are the same, but with hundreds of rows, the results will be various. I hope I describe the questions clearly, but if not, please feel free to ask.

I don't know if I can do it in pandas, but I tried zip scipy, but scipy.stats.rankdata did not give the rank on item at the same index:

l = list(dff["test"])
ranks_list = [scipy.stats.rankdata(x) for x in zip(*l)] # not right
estimated_rank = [sum(x) / len(x) for x in ranks_list]

I am open to any kinds of packages, whichever is convenient. Thank you!

CodePudding user response:

import numpy as np

# Create a numpy array
a = np.array([[1,4,7], [4,2,6], [3,8,1]])

# get the index of the sorted array along each row
# Python uses zero-based indexing so we add 1
rank_list = np.argsort(a, axis=0)   1

# calculate the average rank of each column
final_score = np.mean(rank_list, axis=1)

CodePudding user response:

You could use rank method to get the ranks. Then use agg to get the output as lists for column rank_list. Finally, mean for final_score:

tmp = pd.DataFrame(df['test'].tolist()).apply('rank').astype(int)
df['rank_list'] = tmp.agg(list, axis=1)
df['final_score'] = tmp.mean(axis=1)

Output:

        test  rank_list  final_score
0  [1, 4, 7]  [1, 2, 3]          2.0
1  [4, 2, 6]  [3, 1, 2]          2.0
2  [3, 8, 1]  [2, 3, 1]          2.0
  • Related