Home > database >  How to make a customized grouped dataframe with multiple aggregations
How to make a customized grouped dataframe with multiple aggregations

Time:09-05

I have a standard dataframe like the one below :

       Id Type  Speed Efficiency Durability
0   Id001    A     OK         OK      nonOK
1   Id002    A  nonOK         OK      nonOK
2   Id003    B  nonOK      nonOK      nonOK
3   Id004    B  nonOK      nonOK         OK
4   Id005    A  nonOK      nonOK         OK
5   Id006    A     OK         OK         OK
6   Id007    A     OK      nonOK         OK
7   Id008    B  nonOK      nonOK         OK
8   Id009    C     OK         OK         OK
9   Id010    B     OK         OK      nonOK
10  Id011    C     OK      nonOK         OK
11  Id012    C     OK      nonOK         OK
12  Id013    C  nonOK         OK         OK
13  Id014    C  nonOK      nonOK         OK
14  Id015    C  nonOK      nonOK         OK

And I'm trying to get this kind of output :

  Type   Test  Speed  Efficiency  Durability
0    A     OK      3           3           3
1    A  nonOK      2           2           2
2    B     OK      1           1           2
3    B  nonOK      3           3           2
4    C     OK      3           2           6
5    C  nonOK      3           4           0

I tried with df.groupby('Type').agg('count') but it doesn't give the expected output.

Is it possible to make this kind of transformation with pandas, please ?

CodePudding user response:

You can also use the following solution using pandas method chaining:

import pandas as pd

(pd.melt(df, id_vars='Type', value_vars=['Speed', 'Efficiency', 'Durability'], value_name='Test')
 .groupby(['Type', 'Test', 'variable'])
 .size()
 .reset_index()
 .pivot(index=['Type', 'Test'], columns='variable', values=0)
 .reset_index())

variable Type   Test  Durability  Efficiency  Speed
0           A     OK         3.0         3.0    3.0
1           A  nonOK         2.0         2.0    2.0
2           B     OK         2.0         1.0    1.0
3           B  nonOK         2.0         3.0    3.0
4           C     OK         6.0         2.0    3.0
5           C  nonOK         NaN         4.0    3.0

CodePudding user response:

You could try as follows:

out = df.groupby('Type').agg({col:'value_counts' for col in df.columns[2:]})\
    .fillna(0).astype(int).sort_index().reset_index().rename(
    columns={'level_1':'Test'})

print(out)

  Type   Test  Speed  Efficiency  Durability
0    A     OK      3           3           3
1    A  nonOK      2           2           2
2    B     OK      1           1           2
3    B  nonOK      3           3           2
4    C     OK      3           2           6
5    C  nonOK      3           4           0

CodePudding user response:

This should work, not sure if there's a simpler way:

li = []
for col in ['Speed', 'Efficiency', 'Durability']:
    x = df.groupby('Type')[col].value_counts()
    x.index = x.index.rename('Test', level=1)
    li.append(x)

pd.concat(li, axis=1).fillna(0).astype(int)

             Speed  Efficiency  Durability
Type    Test            
A        OK     3   3   3
        nonOK   2   2   2
B       nonOK   3   3   2
         OK     1   1   2
C        OK     3   2   6
        nonOK   3   4   0
  • Related