Home > Mobile >  How to create pivot table based on binary and categorical columns in DataFrame in Python Pandas?
How to create pivot table based on binary and categorical columns in DataFrame in Python Pandas?

Time:10-13

I have DataFrame in Python Pandas like below:

Data type:

  • ID - numeric
  • U1, U2, U3 - numeric (binary 0/1)
  • CP - string (range)
ID U1 U2 U3 CP
111 1 1 0 10-20
222 1 0 1 10-20
333 0 1 0 20-30
444 0 1 1 40-50
555 1 0 0 10-20

And I need to aggregate above DataFrame using pivot table to have something like below:

  • COUNT_CP - How many ID have defined in index combination U and CP columns (for example combination U1 -'1' and CP_10_20 have 3 IDs)

  • COUNT_U - How many ID have '1' in each 'U' column (for example 3 IDs have '1' in column 'U1')

  • idx - indexes columns

    idx idx COUNT_CP COUNT_U
    U1 CP_10_20 3 3
    CP_20_30 0 3
    CP_30_40 0 3
    CP_40_50 0 3
    U2 CP_10_20 2 3
    CP_20_30 0 3
    CP_30_40 0 3
    CP_40_50 1 3
    U3 CP_10_20 1 2
    CP_20_30 0 2
    CP_30_40 0 2
    CP_40_50 1 2

How can I do that in Python Pandas ?

CodePudding user response:

You could do this using melt and groupby

import pandas as pd
df = pd.DataFrame({'ID': {0: 111, 1: 222, 2: 333, 3: 444, 4: 555},
 'U1': {0: 1, 1: 1, 2: 0, 3: 0, 4: 1},
 'U2': {0: 1, 1: 0, 2: 1, 3: 1, 4: 0},
 'U3': {0: 0, 1: 1, 2: 0, 3: 1, 4: 0},
 'CP': {0: '10-20', 1: '10-20', 2: '20-30', 3: '40-50', 4: '10-20'}})

df = df.drop(columns='ID').melt(id_vars='CP', var_name='idx', value_name='COUNT_CP')
df = df.groupby(['idx','CP'])['COUNT_CP'].sum().reset_index()
df['COUNT_U'] = df.groupby('idx')['COUNT_CP'].transform(sum)
df['CP'] = df['CP'].apply(lambda x: 'CP_'   '_'.join(x.split('-')))

print(df)

Output

  idx        CP  COUNT_CP  COUNT_U
0  U1  CP_10_20         3        3
1  U1  CP_20_30         0        3
2  U1  CP_40_50         0        3
3  U2  CP_10_20         1        3
4  U2  CP_20_30         1        3
5  U2  CP_40_50         1        3
6  U3  CP_10_20         1        2
7  U3  CP_20_30         0        2
8  U3  CP_40_50         1        2
  • Related