I have a df
like the one below:
data1 = [['A', 10, 30, 15, 45],
['B', 22, 35, 25, 18],
['C', 40, 27, 28, 5]]
df1 = pd.DataFrame(data1, columns= ['Units', 'Normal', 'Bad', 'Good', 'Neutral'])
df1
I want to get an additional column which calculates and shows each cell's percentage of the row total. But here is the trick, I want these percentages sorted from highest to lowest but also containing a string description of these percentages using the column name. What I want to achieve is shown in the df2
below under Proportion by largest to smallest
column. My dataset is quite large and I would like to do this automatically.
data2 = [['A', 10, 30, 15, 45, 'Neutral = 0.45%, Bad = 0.3%, Good = 0.15%, Normal = 0.10%'],
['B', 22, 35, 25, 18, 'Bad = 0.35%, Good = 0.25%, Normal = 0.22%, Neutral = 0.18%'],
['C', 40, 27, 28, 5, 'Normal = 0.4%, Good = 0.28%, Bad = 0.27%, Neutral = 0.05']]
df2 = pd.DataFrame(data2, columns= ['Units', 'Normal', 'Bad', 'Good', 'Neutral', 'Proportion by largest to smallest'])
df2
CodePudding user response:
One rare case where I would recommend a custom function and apply
:
cols = df1.columns.difference(['Units'])
def cust_fmt(s):
order = s.sort_values(ascending=False).index
return ', '.join(
(s.index
' = ' s.div(100).round(2).astype(str) '%'
).loc[order]
)
df1['Proportion'] = df1[cols].apply(cust_fmt, axis=1)
output:
Units Normal Bad Good Neutral Proportion
0 A 10 30 15 45 Neutral = 0.45%, Bad = 0.3%, Good = 0.15%, Normal = 0.1%
1 B 22 35 25 18 Bad = 0.35%, Good = 0.25%, Normal = 0.22%, Neutral = 0.18%
2 C 40 27 28 5 Normal = 0.4%, Good = 0.28%, Bad = 0.27%, Neutral = 0.05%
alternative with normalization:
cols = df1.columns.difference(['Units'])
def cust_fmt(s):
order = s.sort_values(ascending=False).index
return ', '.join(
(s.index#.to_series()
' = ' s.astype(str) '%'
).loc[order]
)
df1['Proportion'] = df1[cols].div(df1[cols].sum(axis=1), axis=0).round(2).apply(cust_fmt, axis=1)