I have a CSV with the following data:
Block_Number Block_Type Segment_Duration Speaker_ID
0 1 Pause 1.21 FUZ
1 1 Pause 1.09 FUZ
2 1 AICF 2.06 FAN
3 1 AICF 1.23 FUZ
4 1 AICF 1.58 FAN
... ... ... ... ...
1280 32 Pause 1.03 FUZ
1281 32 Pause 1.16 SIL
1282 32 Pause 2.18 OLN
1283 32 AMF 0.98 FAN
1284 32 AMF 1.78 FAN
I have grouped them by Block Number, Block Type and Speaker ID with this code and filtered the result with the 'CHN' ones only, which gives:
Block_Number Block_Type Speaker_ID Count
1 29 CIC CHN 98
6 9 AICF CHN 30
7 22 CIC CHN 29
10 19 CIC CHN 21
14 13 AICF CHN 15
28 27 CIC CHN 9
... ... ... ... ...
How would I be able to get the percentage of CHN elements within a block? For example, in block 29, there are 98 counts of 'CHN' out of 120 other labels in this block. So in block 29, there would be 81.67% of CHN. I would like to do this for each of the 32 blocks.
Thank you.
CodePudding user response:
You could do something like the following
category = "Block_Number"
col = "Speaker_ID"
value = "FUZ"
d = {'Block_Number': {'0': '1', '1': '1', '2': '1', '3': '1', '4': '1', '1280': '32', '1281': '32', '1282': '32', '1283': '32', '1284': '32'}, 'Block_Type': {'0': 'Pause', '1': 'Pause', '2': 'AICF', '3': 'AICF', '4': 'AICF', '1280': 'Pause', '1281': 'Pause', '1282': 'Pause', '1283': 'AMF', '1284': 'AMF'}, 'Segment_Duration': {'0': '1.21', '1': '1.09', '2': '2.06', '3': '1.23', '4': '1.58', '1280': '1.03', '1281': '1.16', '1282': '2.18', '1283': '0.98', '1284': '1.78'}, 'Speaker_ID': {'0': 'FUZ', '1': 'FUZ', '2': 'FAN', '3': 'FUZ', '4': 'FAN', '1280': 'FUZ', '1281': 'SIL', '1282': 'OLN', '1283': 'FAN', '1284': 'FAN'}}
df = pd.DataFrame(d)
percent_by_block = (
df[df[col] == value]
.groupby(by=category)
.count()[col]
)/(
df.groupby(by=category)
.count()[col])
Here, I take df
to be the example dataframe
Block_Number Block_Type Segment_Duration Speaker_ID
0 1 Pause 1.21 FUZ
1 1 Pause 1.09 FUZ
2 1 AICF 2.06 FAN
3 1 AICF 1.23 FUZ
4 1 AICF 1.58 FAN
1280 32 Pause 1.03 FUZ
1281 32 Pause 1.16 SIL
1282 32 Pause 2.18 OLN
1283 32 AMF 0.98 FAN
1284 32 AMF 1.78 FAN
The resulting series percent_by_block
is
Block_Number
1 0.6
32 0.2
The results here reflect that 60% (i.e. .6) of the rows with Block_Number 1 have Speaker_ID FUZ, whereas 20% of the rows with Block_Number 32 have Speaker_ID FUZ.