Home > Back-end >  Create a stacked bar plot and annotate with count and percent with focus of displaying small values
Create a stacked bar plot and annotate with count and percent with focus of displaying small values

Time:09-02

I have the following dataframe

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib 
print('matplotlib: {}'.format(matplotlib.__version__))
# 3.5.3

df=pd.DataFrame({'Type': [ 'Sentence', 'Array', 'String', '-','-', 'Sentence', 'Array', 'String', '-','-', 'Sentence'],
                 'Length': [42,21,11,6,6,42,21,11,6,6,42],
                 'label': [1,1,0,0,0,1,1,0,0,0,1],
                 })
print(df)
#       Type     Length  label
#0   Sentence      42      1
#1      Array      21      1
#2     String      11      0
#3          -       6      0
#4          -       6      0
#5   Sentence      42      1
#6      Array      21      1
#7     String      11      0
#8          -       6      0
#9          -       6      0
#10  Sentence      42      1

I want to plot stacked bar chart for the arbitrary column within dataframe (either numerical e.g. Length column or enter image description here

CodePudding user response:

  • The values in Expected output do not match df in the OP, so the sample DataFrame has been updated.

  • Plot with enter image description here


    Comment Updates

    • How to always have a spot for 'Array' if it's not in the data:
      • Add 'Array' to dfp if it's not in dfp.index.
      • df.Type = pd.Categorical(df.Type, ['-', 'Array', 'Sentence', 'String'], ordered=True) does not ensure the missing categories are plotted.
    • How to have all the annotations, even if they're small:
      • Don't stack the bars, and set logy=True.
    • This uses the full-data, which was provided in a link.
    # pivot the dataframe and get len
    dfp = df.pivot_table(index='Type', columns='label', values='Length', aggfunc=len) 
    
    # append Array if it's not included
    if 'Array' not in dfp.index:
        dfp = pd.concat([dfp, pd.DataFrame({0: [np.nan], 1: [np.nan]}, index=['Array'])])
        
    # order the index
    dfp = dfp.loc[['-', 'Array', 'Sentence', 'String'], :]
    
    # calculate the percent for each row
    per = dfp.div(dfp.sum(axis=1), axis=0).mul(100).round(2)
    
    # plot the pivoted dataframe
    ax = dfp.plot(kind='bar', stacked=False, figsize=(10, 8), rot=0, logy=True, width=0.75)
    
    # iterate through the containers
    for c in ax.containers:
        
        # get the current segement label (a string); corresponds to column / legend
        label = c.get_label()
        
        # create custom labels with the bar height and the percent from the per column
        # the column labels in per and dfp are int, so convert label to int
        labels = [f'{v.get_height()}\n({row}%)' if v.get_height() > 0 else '' for v, row in zip(c, per[int(label)])]
        
        # add the annotation
        ax.bar_label(c, labels=labels, label_type='edge', fontsize=10, fontweight='bold')
        
    # move the legend
    ax.legend(title='Class', bbox_to_anchor=(1, 1.01), loc='upper left')
    
    # pad the spacing between the number and the edge of the figure
    _ = ax.margins(y=0.1)
    

    enter image description here


    DataFrame Views

    • Based on the sample data in the OP

    df

            Type  Length  label
    0   Sentence      42      1
    1      Array      21      1
    2     String      11      0
    3          -       6      0
    4          -       6      0
    5   Sentence      42      1
    6      Array      21      1
    7     String      11      0
    8          -       6      0
    9          -       6      1
    10  Sentence      42      0
    

    dfp

    label       0    1
    Type              
    -         3.0  1.0
    Array     NaN  2.0
    Sentence  1.0  2.0
    String    2.0  NaN
    

    total

    Type
    -           4.0
    Array       2.0
    Sentence    3.0
    String      2.0
    dtype: float64
    

    per

    label          0       1
    Type                    
    -          75.00   25.00
    Array        NaN  100.00
    Sentence   33.33   66.67
    String    100.00     NaN
    
  • Related