I have DataFrame in Python Pandas like below:
Data type:
- ID - numeric
- U1, U2, U3 - numeric (binary 0/1)
- CP - string (range)
ID | U1 | U2 | U3 | CP |
---|---|---|---|---|
111 | 1 | 1 | 0 | 10-20 |
222 | 1 | 0 | 1 | 10-20 |
333 | 0 | 1 | 0 | 20-30 |
444 | 0 | 1 | 1 | 40-50 |
555 | 1 | 0 | 0 | 10-20 |
And I need to aggregate above DataFrame using pivot table to have something like below:
COUNT_CP - How many ID have defined in index combination U and CP columns (for example combination U1 -'1' and CP_10_20 have 3 IDs)
COUNT_U - How many ID have '1' in each 'U' column (for example 3 IDs have '1' in column 'U1')
idx - indexes columns
idx idx COUNT_CP COUNT_U U1 CP_10_20 3 3 CP_20_30 0 3 CP_30_40 0 3 CP_40_50 0 3 U2 CP_10_20 2 3 CP_20_30 0 3 CP_30_40 0 3 CP_40_50 1 3 U3 CP_10_20 1 2 CP_20_30 0 2 CP_30_40 0 2 CP_40_50 1 2
How can I do that in Python Pandas ?
CodePudding user response:
You could do this using melt
and groupby
import pandas as pd
df = pd.DataFrame({'ID': {0: 111, 1: 222, 2: 333, 3: 444, 4: 555},
'U1': {0: 1, 1: 1, 2: 0, 3: 0, 4: 1},
'U2': {0: 1, 1: 0, 2: 1, 3: 1, 4: 0},
'U3': {0: 0, 1: 1, 2: 0, 3: 1, 4: 0},
'CP': {0: '10-20', 1: '10-20', 2: '20-30', 3: '40-50', 4: '10-20'}})
df = df.drop(columns='ID').melt(id_vars='CP', var_name='idx', value_name='COUNT_CP')
df = df.groupby(['idx','CP'])['COUNT_CP'].sum().reset_index()
df['COUNT_U'] = df.groupby('idx')['COUNT_CP'].transform(sum)
df['CP'] = df['CP'].apply(lambda x: 'CP_' '_'.join(x.split('-')))
print(df)
Output
idx CP COUNT_CP COUNT_U
0 U1 CP_10_20 3 3
1 U1 CP_20_30 0 3
2 U1 CP_40_50 0 3
3 U2 CP_10_20 1 3
4 U2 CP_20_30 1 3
5 U2 CP_40_50 1 3
6 U3 CP_10_20 1 2
7 U3 CP_20_30 0 2
8 U3 CP_40_50 1 2