I have a dataframe of this form:
df <- tibble(class = c(0, 1, 0, 1, 1, 0, 0), f1 = c(0, 1, 1, 0, 0, 0, 0), f2 = c(0,0,0,1,1,1,1))
class f1 f2
<dbl> <dbl> <dbl>
1 0 0 0
2 1 1 0
3 0 1 0
4 1 0 1
5 1 0 1
6 0 0 1
7 0 0 1
A now want to create a bar plot for each of the feature columns f1
and f2
in the following way. The bar plot should show the distribution over class (0 or 1
) where a feature == 1
(feature == 0 does not need to be considered). In words i want to know: is there a big class difference if a feature is active, i.e. 1
.
What is a nice way to achieve this with ggplot?
CodePudding user response:
library(tidyverse)
df %>%
pivot_longer(-class, names_to = 'feature') %>%
ggplot(aes(feature, value, fill = factor(class)))
geom_bar(stat = 'summary', fun = 'sum', position = 'dodge')
CodePudding user response:
Here is a pimped alternative to @Axeman correct version!
library(tidyverse)
df %>%
pivot_longer(-class) %>%
group_by(class, name) %>%
summarise(sum = sum(value), .groups = "drop") %>%
ggplot(aes(x=factor(name), y = sum, fill=factor(class)))
geom_col(position=position_stack(), width= 0.6)
labs(x="Feature", y="Value", fill="class")
geom_text(aes(label = sum),
position = position_stack(vjust = .5), color="white")
scale_fill_manual(values = c("red", "blue"))
theme_classic()
theme(legend.position = "top",
panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust=1))