I have a df with several "attributes" and scores for each attribute:
df = pd.DataFrame(np.random.random_sample((5, 3)), columns=['speed', 'endurance', 'strength'])
speed endurance strength
0 0.901001 0.531624 0.599738
1 0.302515 0.439233 0.143280
2 0.616964 0.984054 0.151714
3 0.681594 0.601456 0.655960
4 0.831605 0.742175 0.276892
Thus for each row I can list the primary, secondary and tertiary attribute based on the score, with df.to_numpy().argsort()
.
I need to get a summary of the number of times each attribute achieved each position, eg:
position speed endurance strength
primary 3 2 0
secondary 2 1 2
tertiary 0 2 3
(I calculated my expected output by hand so it might have errors, but hopefully the principle is clear.)
I think I could do this by (np.argsort(arr)==0).sum()
for each of [0, 1, 2]
but is there a simpler way?
CodePudding user response:
You can use rank
and crosstab
:
# names
d = {1:'primary',2:'secondary', 3:'tertiary'}
# ranks
s = df.rank(1, ascending=False).stack().map(d)
out = pd.crosstab(s, s.index.get_level_values(1)).rename_axis(index=None, columns=None)
output:
endurance speed strength
primary 2 3 0
secondary 1 2 2
tertiary 2 0 3
Alternative format:
out = (pd.crosstab(s, s.index.get_level_values(1))
.rename_axis(index='position', columns=None).reset_index()
)
output:
position endurance speed strength
0 primary 2 3 0
1 secondary 1 2 2
2 tertiary 2 0 3
CodePudding user response:
You can count DataFrame
created from argsort
with replace missing values to 0
and rename
index values with change order or rows by DataFrame.iloc
:
d = {2:'primary',1:'secondary', 0:'tertiary'}
df = (pd.DataFrame(df.to_numpy().argsort(), columns=df.columns)
.apply(pd.value_counts)
.fillna(0)
.astype(int)
.iloc[::-1]
.rename(d))
print (df)
speed endurance strength
primary 3 2 0
secondary 2 1 2
tertiary 0 2 3