Home > Enterprise >  Calculate cell percentage of row total and sort largest to smallest with some description using colu
Calculate cell percentage of row total and sort largest to smallest with some description using colu

Time:10-06

I have a df like the one below:

data1 = [['A', 10, 30, 15, 45], 
        ['B', 22, 35, 25, 18], 
        ['C', 40, 27, 28, 5]]
       
df1 = pd.DataFrame(data1, columns= ['Units', 'Normal', 'Bad', 'Good', 'Neutral'])
df1

I want to get an additional column which calculates and shows each cell's percentage of the row total. But here is the trick, I want these percentages sorted from highest to lowest but also containing a string description of these percentages using the column name. What I want to achieve is shown in the df2 below under Proportion by largest to smallest column. My dataset is quite large and I would like to do this automatically.

data2 = [['A', 10, 30, 15, 45, 'Neutral  = 0.45%, Bad = 0.3%, Good = 0.15%, Normal = 0.10%'], 
            ['B', 22, 35, 25, 18, 'Bad = 0.35%, Good = 0.25%, Normal = 0.22%, Neutral = 0.18%'], 
            ['C', 40, 27, 28, 5, 'Normal = 0.4%, Good = 0.28%, Bad = 0.27%, Neutral = 0.05']]
           
df2 = pd.DataFrame(data2, columns= ['Units', 'Normal', 'Bad', 'Good', 'Neutral', 'Proportion by largest to smallest'])
df2

CodePudding user response:

One rare case where I would recommend a custom function and apply:

cols = df1.columns.difference(['Units'])

def cust_fmt(s):
    order = s.sort_values(ascending=False).index
    return ', '.join(
           (s.index
              ' = '   s.div(100).round(2).astype(str)   '%'
           ).loc[order]
           )

df1['Proportion'] = df1[cols].apply(cust_fmt, axis=1)

output:

  Units  Normal  Bad  Good  Neutral                                                  Proportion
0     A      10   30    15       45    Neutral = 0.45%, Bad = 0.3%, Good = 0.15%, Normal = 0.1%
1     B      22   35    25       18  Bad = 0.35%, Good = 0.25%, Normal = 0.22%, Neutral = 0.18%
2     C      40   27    28        5   Normal = 0.4%, Good = 0.28%, Bad = 0.27%, Neutral = 0.05%
alternative with normalization:
cols = df1.columns.difference(['Units'])

def cust_fmt(s):
    order = s.sort_values(ascending=False).index
    return ', '.join(
           (s.index#.to_series()
              ' = '   s.astype(str)   '%'
           ).loc[order]
           )

df1['Proportion'] = df1[cols].div(df1[cols].sum(axis=1), axis=0).round(2).apply(cust_fmt, axis=1)
  • Related