I am R programmer, but need to achieve rankings in a table using python. Let's say I have a column "test" with a list of number lists:
df = pd.DataFrame({"test":[[1,4,7], [4,2,6], [3,8,1]]})
I expected to rank each item at the same location across rows (lists), and average all ranks to get a final score:
expected:
test rank_list final_score
0 [1, 4, 7] [1, 2, 3] 2
1 [4, 2, 6] [3, 1, 2] 2
2 [3, 8, 1] [2, 3, 1] 2
I know it is not a good example that all final scores are the same, but with hundreds of rows, the results will be various. I hope I describe the questions clearly, but if not, please feel free to ask.
I don't know if I can do it in pandas, but I tried zip scipy, but scipy.stats.rankdata
did not give the rank on item at the same index:
l = list(dff["test"])
ranks_list = [scipy.stats.rankdata(x) for x in zip(*l)] # not right
estimated_rank = [sum(x) / len(x) for x in ranks_list]
I am open to any kinds of packages, whichever is convenient. Thank you!
CodePudding user response:
import numpy as np
# Create a numpy array
a = np.array([[1,4,7], [4,2,6], [3,8,1]])
# get the index of the sorted array along each row
# Python uses zero-based indexing so we add 1
rank_list = np.argsort(a, axis=0) 1
# calculate the average rank of each column
final_score = np.mean(rank_list, axis=1)
CodePudding user response:
You could use rank
method to get the ranks. Then use agg
to get the output as lists for column rank_list
. Finally, mean
for final_score
:
tmp = pd.DataFrame(df['test'].tolist()).apply('rank').astype(int)
df['rank_list'] = tmp.agg(list, axis=1)
df['final_score'] = tmp.mean(axis=1)
Output:
test rank_list final_score
0 [1, 4, 7] [1, 2, 3] 2.0
1 [4, 2, 6] [3, 1, 2] 2.0
2 [3, 8, 1] [2, 3, 1] 2.0