Home > Software design >  How do I construct a comparison table from a dataframe?
How do I construct a comparison table from a dataframe?

Time:08-05

Given a pandas dataframe like this:

df = pd.DataFrame([['A', 'M1', 1], ['A', 'M2', 4], 
                   ['B', 'M1', 3], ['B', 'M2', 2],
                   ['C', 'M1', 5], ['C', 'M2', 0]],
                   columns=['task', 'method', 'value'])


  task method  value
0    A     M1      1
1    A     M2      4
2    B     M1      3
3    B     M2      2
4    C     M1      5
5    C     M2      0

I want to generate a comparison table like this:

method        M1   M2
method               
M1             0    2
M2             1    0

That is, the value at row r, column c is the number of tasks for which method r had a higher value than method c.

I tried adapting the solution here, but I can't figure out how to take into account the task.

How can I go about doing creating this table?

CodePudding user response:

Here is a way to adapt the solution you provided in your question. First pivot the data to get a row per method. Then use the numpy array to be able to broadcast and compare. Aggregate along the task axis of your array. Finally build the dataframe to get as expected.

_df = df.pivot(index='method', columns='task', values='value')
print(_df)
# task    A  B  C
# method         
# M1      1  3  5
# M2      4  2  0

_arr = _df.to_numpy()
# now you can use the broadcasting method like in the link provided
# You need to aggregate (sum) along the task axis being the last one here
_res = (_arr<_arr[:,None]).sum(axis=-1)
print(_res)
# [[0 2]
#  [1 0]]

# after build your dataframe
res = pd.DataFrame(_res, index=_df.index, columns=_df.index)
print(res)
# method  M1  M2
# method        
# M1       0   2
# M2       1   0
  • Related