I have the following Pandas df I would like to plot:
Segment length Parameter Parameter value Train score Test score
0 16 n_estimators 5.0 0.975414 0.807823
1 16 n_estimators 10.0 0.982342 0.756803
2 16 n_estimators 15.0 1.000000 0.801020
3 16 max_depth 2.0 0.580884 0.284014
4 16 max_depth 6.0 1.000000 0.824830
5 16 max_depth 10.0 1.000000 0.824830
6 16 max_features 0.1 1.000000 0.845238
7 16 max_features 0.3 1.000000 0.845238
8 16 max_features 0.5 1.000000 0.845238
9 32 n_estimators 5.0 0.961905 0.714286
10 32 n_estimators 10.0 0.988095 0.857143
11 32 n_estimators 15.0 1.000000 0.857143
12 32 max_depth 2.0 0.785714 0.571429
13 32 max_depth 6.0 1.000000 0.857143
14 32 max_depth 10.0 1.000000 0.857143
15 32 max_features 0.1 1.000000 0.904762
16 32 max_features 0.3 1.000000 0.904762
17 32 max_features 0.5 1.000000 0.857143
The plot I imagine is a grouped bar-chart containing groups by 'segment length', containing further groups by 'parameter', containing further groups by 'value', containing two bars of 'train score' and 'test score' (either side-by-side or stacked) ... Now that's a handful, but it works on paper. I've been trying to get this to work in Matplotlib (or R) all day without success. Does anybody have a suggestion on how to get this to work?
(NB in the above dataframe I have two 'Segment length' groups, and only three 'Parameter value' groups per parameter; eventually this will be 6 groups and 10 or so groups each respectfully.)
CodePudding user response:
Here is a suggestion using R: We can switch the grouping dynamics: e.g. fill and faceting.
What we do here:
- Bring Score in long format
- Group and calculate the mean and sd
- plot with ggplot
library(tidyverse)
library(ggsci)
df %>%
pivot_longer(ends_with("score")) %>%
group_by(name, Segment_length, Parameter) %>%
summarise(mean_value = mean(value), sd_value = sd(value)) %>%
ggplot(aes(x= name, y = mean_value, fill=factor(Segment_length)))
geom_bar(stat="identity",position="dodge")
facet_wrap(. ~ Parameter)
geom_errorbar(mapping=aes(ymin=mean_value-sd_value,ymax=mean_value sd_value),
width=0.2,position=position_dodge(width=0.9))
theme_classic()
scale_fill_nejm()
labs(x="Test/Train", y="Score", fill="Segment Length")