Given a pandas dataframe like this:
df = pd.DataFrame([['A', 'M1', 1], ['A', 'M2', 4],
['B', 'M1', 3], ['B', 'M2', 2],
['C', 'M1', 5], ['C', 'M2', 0]],
columns=['task', 'method', 'value'])
task method value
0 A M1 1
1 A M2 4
2 B M1 3
3 B M2 2
4 C M1 5
5 C M2 0
I want to generate a comparison table like this:
method M1 M2
method
M1 0 2
M2 1 0
That is, the value at row r
, column c
is the number of tasks for which method r
had a higher value than method c
.
I tried adapting the solution here, but I can't figure out how to take into account the task.
How can I go about doing creating this table?
CodePudding user response:
Here is a way to adapt the solution you provided in your question. First pivot the data to get a row per method. Then use the numpy array to be able to broadcast and compare. Aggregate along the task axis of your array. Finally build the dataframe to get as expected.
_df = df.pivot(index='method', columns='task', values='value')
print(_df)
# task A B C
# method
# M1 1 3 5
# M2 4 2 0
_arr = _df.to_numpy()
# now you can use the broadcasting method like in the link provided
# You need to aggregate (sum) along the task axis being the last one here
_res = (_arr<_arr[:,None]).sum(axis=-1)
print(_res)
# [[0 2]
# [1 0]]
# after build your dataframe
res = pd.DataFrame(_res, index=_df.index, columns=_df.index)
print(res)
# method M1 M2
# method
# M1 0 2
# M2 1 0