hi I have a dataset that looks much like this data frame below:
#Table1 :
print("Table1: Current Table")
data = [['ALFA', 35, 47, 67, 44, 193],
['Bravo', 51, 52, 16, 8, 127],
['Charlie', 59, 75, 2, 14, 150],
['Delta', 59, 75, 2, 34, 170],
['Echo', 59, 75, 2, 14, 150],
['Foxtrot', 40, 43, 26, 27, 136],
['Golf', 35, 31, 22, 13, 101],
['Hotel', 89, 58, 24, 34, 205]]
df = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total'])
#df.loc[:,'Total'] = df.sum(axis=1)
print(df)
i would want to get the percentage of all cells against their row totals (calculated in column 'Total') such that it looks this:
#Table2 :
print('Table2: Expected Outcome')
data2 = [['ALFA',18.1, 24.4, 34.7, 22.8, 193],
['Bravo',40.2, 40.9, 12.6, 6.3, 127],
['Charlie',39.3, 50.0, 1.3, 9.3, 150],
['Delta',34.7, 44.1, 1.2, 20.0, 170],
['Echo',39.3, 50.0, 1.3, 9.3, 150],
['Foxtrot',29.4, 31.6, 19.1, 19.9, 136],
['Hotel',34.7, 30.7, 21.8, 12.9, 101],
['Golf',43.4, 28.3, 11.7, 16.6, 205]]
df2 = pd.DataFrame(data2, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total']) #.round(decimals=1)
#df.loc[:,'Total'] = df.sum(axis=1)
print(df2)
I am not really interested if the total column does change, is recalculated or have to be dropped in the process; but for completeness sake it would be good to have a 'Total' column along with the cells' percentages
CodePudding user response:
you can try using apply :
df[['Column1', 'Column2', 'Column3', 'Column4']] = df[['Column1', 'Column2', 'Column3', 'Column4']].apply(lambda x: x/x.sum(), axis=1)
Output :
Table1: Current Table
Objects Column1 Column2 Column3 Column4 Total
0 ALFA 0.181347 0.243523 0.347150 0.227979 193
1 Bravo 0.401575 0.409449 0.125984 0.062992 127
2 Charlie 0.393333 0.500000 0.013333 0.093333 150
3 Delta 0.347059 0.441176 0.011765 0.200000 170
4 Echo 0.393333 0.500000 0.013333 0.093333 150
5 Foxtrot 0.294118 0.316176 0.191176 0.198529 136
6 Golf 0.346535 0.306931 0.217822 0.128713 101
7 Hotel 0.434146 0.282927 0.117073 0.165854 205
CodePudding user response:
Use fast vecorized division all columns without Objects/Total
by DataFrame.div
:
c = df.columns.difference(['Objects','Total'])
df[c] = df[c].div(df['Total'], axis=0).mul(100)
print(df)
Objects Column1 Column2 Column3 Column4 Total
0 ALFA 18.134715 24.352332 34.715026 22.797927 193
1 Bravo 40.157480 40.944882 12.598425 6.299213 127
2 Charlie 39.333333 50.000000 1.333333 9.333333 150
3 Delta 34.705882 44.117647 1.176471 20.000000 170
4 Echo 39.333333 50.000000 1.333333 9.333333 150
5 Foxtrot 29.411765 31.617647 19.117647 19.852941 136
6 Golf 34.653465 30.693069 21.782178 12.871287 101
7 Hotel 43.414634 28.292683 11.707317 16.585366 205
CodePudding user response:
df1=pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total'])
for col in df.columns[1:-1]: df1[col]=(df[col]*100/df.Total)
df1