Home > database >  R: Creating a bar plot for each active feature in dataframe , grouped by class
R: Creating a bar plot for each active feature in dataframe , grouped by class

Time:06-09

I have a dataframe of this form:

df <- tibble(class = c(0, 1, 0, 1, 1, 0, 0), f1 = c(0, 1, 1, 0, 0, 0, 0), f2 = c(0,0,0,1,1,1,1))
  class    f1    f2
  <dbl> <dbl> <dbl>
1     0     0     0
2     1     1     0
3     0     1     0
4     1     0     1
5     1     0     1
6     0     0     1
7     0     0     1

A now want to create a bar plot for each of the feature columns f1 and f2 in the following way. The bar plot should show the distribution over class (0 or 1) where a feature == 1 (feature == 0 does not need to be considered). In words i want to know: is there a big class difference if a feature is active, i.e. 1.

What is a nice way to achieve this with ggplot?

CodePudding user response:

library(tidyverse)

df %>% 
  pivot_longer(-class, names_to = 'feature') %>% 
  ggplot(aes(feature, value, fill = factor(class)))  
  geom_bar(stat = 'summary', fun = 'sum', position = 'dodge')

enter image description here

CodePudding user response:

Here is a pimped alternative to @Axeman correct version!

library(tidyverse)

df %>% 
  pivot_longer(-class) %>% 
  group_by(class, name) %>% 
  summarise(sum = sum(value), .groups = "drop") %>% 
  ggplot(aes(x=factor(name), y = sum, fill=factor(class)))  
  geom_col(position=position_stack(), width= 0.6) 
  labs(x="Feature", y="Value", fill="class")  
  geom_text(aes(label =  sum),
            position = position_stack(vjust = .5), color="white")  
  scale_fill_manual(values = c("red", "blue")) 
  theme_classic() 
  theme(legend.position = "top", 
        panel.grid = element_blank(), 
        axis.text.x = element_text(angle = 45, hjust=1))

enter image description here

  • Related