Home > Software design >  How to plot stacked bars within grouped bars within further grouped bars in a bar-chart using Python
How to plot stacked bars within grouped bars within further grouped bars in a bar-chart using Python

Time:11-18

I have the following Pandas df I would like to plot:

    Segment length     Parameter  Parameter value  Train score  Test score
0               16  n_estimators              5.0     0.975414    0.807823
1               16  n_estimators             10.0     0.982342    0.756803
2               16  n_estimators             15.0     1.000000    0.801020
3               16     max_depth              2.0     0.580884    0.284014
4               16     max_depth              6.0     1.000000    0.824830
5               16     max_depth             10.0     1.000000    0.824830
6               16  max_features              0.1     1.000000    0.845238
7               16  max_features              0.3     1.000000    0.845238
8               16  max_features              0.5     1.000000    0.845238
9               32  n_estimators              5.0     0.961905    0.714286
10              32  n_estimators             10.0     0.988095    0.857143
11              32  n_estimators             15.0     1.000000    0.857143
12              32     max_depth              2.0     0.785714    0.571429
13              32     max_depth              6.0     1.000000    0.857143
14              32     max_depth             10.0     1.000000    0.857143
15              32  max_features              0.1     1.000000    0.904762
16              32  max_features              0.3     1.000000    0.904762
17              32  max_features              0.5     1.000000    0.857143

The plot I imagine is a grouped bar-chart containing groups by 'segment length', containing further groups by 'parameter', containing further groups by 'value', containing two bars of 'train score' and 'test score' (either side-by-side or stacked) ... Now that's a handful, but it works on paper. I've been trying to get this to work in Matplotlib (or R) all day without success. Does anybody have a suggestion on how to get this to work?

(NB in the above dataframe I have two 'Segment length' groups, and only three 'Parameter value' groups per parameter; eventually this will be 6 groups and 10 or so groups each respectfully.)

CodePudding user response:

Here is a suggestion using R: We can switch the grouping dynamics: e.g. fill and faceting.

What we do here:

  1. Bring Score in long format
  2. Group and calculate the mean and sd
  3. plot with ggplot
library(tidyverse) 
library(ggsci)
df %>% 
  pivot_longer(ends_with("score")) %>% 
  group_by(name, Segment_length, Parameter) %>% 
  summarise(mean_value = mean(value), sd_value = sd(value)) %>% 
  ggplot(aes(x= name, y = mean_value, fill=factor(Segment_length))) 
  geom_bar(stat="identity",position="dodge") 
  facet_wrap(. ~ Parameter) 
  geom_errorbar(mapping=aes(ymin=mean_value-sd_value,ymax=mean_value sd_value),
                width=0.2,position=position_dodge(width=0.9)) 
  theme_classic() 
  scale_fill_nejm()  
  labs(x="Test/Train", y="Score", fill="Segment Length") 

enter image description here

  • Related