I'm having a dataframe and trying to get the output which shows percentage of each value in different category. Can anyone help on how can I do it?
Raw data table:
Interface_Bin | Product |
---|---|
1 | ADL |
1 | ADL |
22 | ADL |
97 | ADL |
1 | JSL |
1 | JSL |
97 | JSL |
97 | JSL |
22 | JSL |
Expected outcome:
Product | Bin(97)_count | Total_interfacebin_count | Bin_97_percentage_vs total count |
---|---|---|---|
ADL | 1 | 4 | 25% |
JSL | 2 | 5 | 40% |
Thanks alot.
CodePudding user response:
df = pd.crosstab(df['Product'], df['Interface_Bin'])
f1 = lambda x: f'Bin({x})_count'
f2 = lambda x: f'Bin({x})_percentage_vs total count'
s = df.sum(axis=1).rename('Total_interfacebin_count')
df2 = df.div(s, axis=0).rename(columns=f2).mul(100)
df = pd.concat([df.rename(columns=f1), s, df2], axis=1).sort_index(axis=1)
print (df)
Bin(1)_count Bin(1)_percentage_vs total count Bin(22)_count \
Product
ADL 2 50.0 1
JSL 2 40.0 1
Bin(22)_percentage_vs total count Bin(97)_count \
Product
ADL 25.0 1
JSL 20.0 2
Bin(97)_percentage_vs total count Total_interfacebin_count
Product
ADL 25.0 4
JSL 40.0 5
CodePudding user response:
first sort the data so you get a dictionary (or list) like: {"ADL":121,"JSL":218} probably with code like
for element in table row:
if dict.contains(element.key) //in this case ADL or JSL
dict[element.key].value =element.value
else
dict[element.key]=element.value
then go through and sum all the dictionary values to get the total sum, or sum them as you add to the dictionary in the above code finally get each percentage as (dict[key].value/sum)*100 "%"