Home > Net >  Summarise ranks of each column after argsort
Summarise ranks of each column after argsort

Time:05-24

I have a df with several "attributes" and scores for each attribute:

df = pd.DataFrame(np.random.random_sample((5, 3)), columns=['speed', 'endurance', 'strength'])


      speed     endurance   strength
0   0.901001    0.531624    0.599738
1   0.302515    0.439233    0.143280
2   0.616964    0.984054    0.151714
3   0.681594    0.601456    0.655960
4   0.831605    0.742175    0.276892

Thus for each row I can list the primary, secondary and tertiary attribute based on the score, with df.to_numpy().argsort().

I need to get a summary of the number of times each attribute achieved each position, eg:

position  speed endurance strength
primary        3    2   0
secondary      2    1   2
tertiary       0    2   3

(I calculated my expected output by hand so it might have errors, but hopefully the principle is clear.)

I think I could do this by (np.argsort(arr)==0).sum() for each of [0, 1, 2] but is there a simpler way?

CodePudding user response:

You can use rank and crosstab:

# names
d = {1:'primary',2:'secondary', 3:'tertiary'}

# ranks
s = df.rank(1, ascending=False).stack().map(d)

out = pd.crosstab(s, s.index.get_level_values(1)).rename_axis(index=None, columns=None)

output:

           endurance  speed  strength
primary            2      3         0
secondary          1      2         2
tertiary           2      0         3

Alternative format:

out = (pd.crosstab(s, s.index.get_level_values(1))
         .rename_axis(index='position', columns=None).reset_index()
       )

output:

    position  endurance  speed  strength
0    primary          2      3         0
1  secondary          1      2         2
2   tertiary          2      0         3

CodePudding user response:

You can count DataFrame created from argsort with replace missing values to 0 and rename index values with change order or rows by DataFrame.iloc:

d = {2:'primary',1:'secondary', 0:'tertiary'}
df = (pd.DataFrame(df.to_numpy().argsort(), columns=df.columns)
        .apply(pd.value_counts)
        .fillna(0)
        .astype(int)
        .iloc[::-1]
        .rename(d))

print (df)
           speed  endurance  strength
primary        3          2         0
secondary      2          1         2
tertiary       0          2         3
  • Related