Home > database >  Plotting group proportions with continuous variable
Plotting group proportions with continuous variable

Time:03-06

I would like to plot the proportion of levels of a group alongside a continuous variable. Since the x-axis is continuous, it is not really possible to compute proportions at each point (since there is an infinite number of them). So, usually, one cuts the continuous variable into bins, and plot them. Another solution is to use the density, but I want the proportions (so, the percentage) in the y-axis and I'm pretty sure density is not about proportions.

As an example, let's use iris and try to plot the share of each species among Sepal.Length. One can create bins using Hmisc::cut2 and then count the proportions for each group:

library(tidyverse)
library(Hmisc)

dat <- iris %>% 
  mutate(Sepal.Length = Sepal.Length   rnorm(n()),
         cut = cut2(Sepal.Length, g = 30, levels.mean = T)) %>% 
  group_by(cut) %>% 
  summarise(set = sum(Species == "setosa") / n(),
            vir = sum(Species == "virginica") / n(),
            ver = sum(Species == "versicolor") / n()) %>% 
  pivot_longer(-cut)

# A tibble: 90 x 3
   cut    name  value
   <fct>  <chr> <dbl>
 1 3.0126 set     0.6
 2 3.0126 vir     0  
 3 3.0126 ver     0.4
 4 3.7616 set     0.8
 5 3.7616 vir     0  
 6 3.7616 ver     0.2
 7 3.9898 set     0.8
 8 3.9898 vir     0  
 9 3.9898 ver     0.2
10 4.1577 set     0.2
# ... with 80 more rows

And the plot looks like this, e.g. for name == "ver"

dat %>% 
  filter(name == "ver") %>% 
  ggplot(aes(x = cut, y = value))  
  geom_col()

enter image description here

Now, is there any way to make this easier, and more esthetic?

Especially, making the x-axis a continuous back again so that one could e.g. create a geom_line between every columns of the plot (maybe making rolling means?). Or is it a bad practice and that's why I can't see any documentation about this?

CodePudding user response:

Setting the variable cut to numeric did the job, but there may still be better options.

dat %>% 
  filter(name == "ver") %>% 
  ggplot(aes(x = as.numeric(as.character(cut)), y = value))  
  geom_col()

Or with a line:

dat %>% 
  filter(name == "ver") %>% 
  ggplot(aes(x = as.numeric(as.character(cut)), y = value))  
  geom_line()

enter image description here

  • Related