I have DataFrame in Python Pandas like below:
ID | U1 | U2 | U3 | CP | CH |
---|---|---|---|---|---|
111 | 1 | 1 | 0 | 10-20 | 1 |
222 | 1 | 0 | 1 | 10-20 | 1 |
333 | 0 | 1 | 0 | 20-30 | 0 |
444 | 0 | 1 | 1 | 40-50 | 0 |
555 | 1 | 0 | 0 | 10-20 | 0 |
And I need to create column with percent of '1' in column 'CH' per combination for: U1/U2/U3 and CP, so as a result i need something like below:
| idx | idx | CH_perc |
| ----|---------|-----------|
|U1 |CP_10_20 | 0.66 | 3 IDs have '1' in U1 and CP = 10-20, and 2 of them have '1' in CH, so 2/3 = 0.66
|U1 |CP_20_30 | 0 |
|U1 |CP_30_40 | 0 |
|U1 |CP_40_50 | 0 |
|U2 |CP_10_20 | 1.00 | 1 ID have '1' in U2 and CP = 10-20, and have '1' in CH, so 1/1/ = 1.00
|U2 |CP_20_30 | 0 |
|U2 |CP_30_40 | 0 |
|U2 |CP_40_50 | 0 |
|U3 |CP_10_20 | 1.00 | 1 ID have '1' in U3 and CP = 10-20, and have '1' in CH, so 1/1/ = 1.00
|U3 |CP_20_30 | 0 |
|U3 |CP_30_40 | 0 |
|U3 |CP_40_50 | 0 |
How can I od that in Python Pandas ?
CodePudding user response:
You can use a melt
and groupby.sum
based approach:
(df.drop(columns='ID')
.melt(['CP', 'CH'], var_name='idx')
# keep only CH where value is 1
.assign(CH=lambda d: d['CH'].mul(d['value']))
.groupby(['idx', 'CP'], as_index=False).sum()
.assign(CH_perc=lambda d: d.pop('CH').div(d.pop('value')).fillna(0))
)
output:
idx CP CH_perc
0 U1 10-20 0.666667
1 U1 20-30 0.000000
2 U1 40-50 0.000000
3 U2 10-20 1.000000
4 U2 20-30 0.000000
5 U2 40-50 0.000000
6 U3 10-20 1.000000
7 U3 20-30 0.000000
8 U3 40-50 0.000000