Home > Enterprise >  Plot binned discrete data with spaces between bars in ggplot2
Plot binned discrete data with spaces between bars in ggplot2

Time:01-11

I have integer data whose frequency I want to plot within numerically ranged bins. I can plot a histogram, but the plot puts the bars together like continuous data. I want spaces between the bars in order to reflect that the data are discrete. I've tried various approaches to make that happen and they don't quite make it.

It seems to me that this should happen out-of-the-box and I feel I must be missing something fundamental, like I'm using the wrong geom or I don't know a parameter I need.

Here are some data and my first plot attempt. A histogram bins the data but it looks continuous.

library(dplyr)
library(ggplot2)

home_runs_df <- data.frame(home_runs = c(48, 40, 38, 37, 37, 37, 37, 37, 36, 36,
               35, 34, 34, 34, 33, 33, 33, 33, 33, 33,
               33, 32, 32, 32, 32, 32, 31, 31, 29, 29,
               29, 29, 28, 28, 27, 27, 27, 27, 27, 26))

# Histogram bins, but plots like continuous.
home_runs_df %>%
    ggplot(mapping = aes(x = home_runs))  
    geom_histogram(bins = 6)

I tried to set the bar width, but it's deprecated and the message suggests using geom_bar(), even though that's what I used! I suppose stat = "bin" ends up calling geom_histogram() under the hood instead of geom_bar().

home_runs_df %>%
    ggplot(mapping = aes(x = home_runs))  
    geom_bar(stat = "bin", bins = 6, width = 0.5)

This lets me change the width but how do I bin the data?

home_runs_df %>%
    ggplot(mapping = aes(x = home_runs))  
    geom_bar(stat = "count", width = 0.75)

I can bin the data myself, but I resent having to. I expect ggplot to do that for me, and these results aren't completely what I want anyway.

  • I'm missing the zero-height bin (41, 45].
  • On the other hand, I do like where the x-axis labels appear, and I know how to set the text I want.
home_runs_df %>%
    mutate(bin = cut(home_runs, breaks = c(seq(25, 48, by = 4), Inf))) %>%
    ggplot(mapping = aes(x = bin))  
    geom_histogram(stat = "count")

I got this approach from the ggplot reference documentation. Use geom_bar() and scale_x_binned().

  • The bars look the way I want for discrete data (spaces between).
  • I don't like the way the tick marks display between the bins, as if the data are continuous.
home_runs_df %>%
    ggplot(aes(home_runs))  
    geom_bar()  
    scale_x_binned(n.breaks = 6)

I bet there are tricky ways to displace the labels to the position I want. Is this degree of customization necessary? I think what I'm trying to do is pretty conventional. What am I missing?

What I want is:

  • Bin discrete, whole-number data.
  • Plot the frequencies in the bins.
  • Treat the integer data as discrete instead of continuous, with spaces between the bars and labels directly under each group.

CodePudding user response:

From the description of what you want to achieve I think that using geom_bar and binning with ggplot2::cut_interval or ggplot2::cut_width would be the way to go to have bars with padding and a discrete scale at the same time.

Note: I also added a scale_x_discrete(drop = FALSE) in case you want to show the intervals with no obs. too.

library(ggplot2)

ggplot(home_runs_df, aes(x = cut_interval(home_runs, 6)))   
  geom_bar()  
  scale_x_discrete(drop = FALSE)

ggplot(home_runs_df, aes(x = cut_width(home_runs, 4, boundary = 26)))   
  geom_bar()  
  scale_x_discrete(drop = FALSE)

CodePudding user response:

I think you can use your cut tricks, but setting properly the labels argument in the cut function:

width <- 4
cutseq <- seq(25, 48, width)
home_runs_df %>%
  mutate(bin = cut(home_runs, 
                   breaks = c(cutseq, Inf),
                   labels = cutseq width/2) %>%
           as.character()%>%
           as.numeric()) %>%
  ggplot(mapping = aes(x = bin))  
  geom_histogram(stat = "count",width = 3)

enter image description here

Therefore, each column is centered on the bin you choose, and you use continuous scale, that you can rescale as you want with scale_x_continuous if you need

  • Related