I have integer data whose frequency I want to plot within numerically ranged bins. I can plot a histogram, but the plot puts the bars together like continuous data. I want spaces between the bars in order to reflect that the data are discrete. I've tried various approaches to make that happen and they don't quite make it.
It seems to me that this should happen out-of-the-box and I feel I must be missing something fundamental, like I'm using the wrong geom
or I don't know a parameter I
need.
Here are some data and my first plot attempt. A histogram bins the data but it looks continuous.
library(dplyr)
library(ggplot2)
home_runs_df <- data.frame(home_runs = c(48, 40, 38, 37, 37, 37, 37, 37, 36, 36,
35, 34, 34, 34, 33, 33, 33, 33, 33, 33,
33, 32, 32, 32, 32, 32, 31, 31, 29, 29,
29, 29, 28, 28, 27, 27, 27, 27, 27, 26))
# Histogram bins, but plots like continuous.
home_runs_df %>%
ggplot(mapping = aes(x = home_runs))
geom_histogram(bins = 6)
I tried to set the bar width, but it's deprecated and the message suggests using geom_bar()
, even though that's what I used! I suppose stat = "bin"
ends up calling geom_histogram()
under the hood instead of geom_bar()
.
home_runs_df %>%
ggplot(mapping = aes(x = home_runs))
geom_bar(stat = "bin", bins = 6, width = 0.5)
This lets me change the width but how do I bin the data?
home_runs_df %>%
ggplot(mapping = aes(x = home_runs))
geom_bar(stat = "count", width = 0.75)
I can bin the data myself, but I resent having to. I expect ggplot
to do that for me, and these results aren't completely what I want anyway.
- I'm missing the zero-height bin (41, 45].
- On the other hand, I do like where the x-axis labels appear, and I know how to set the text I want.
home_runs_df %>%
mutate(bin = cut(home_runs, breaks = c(seq(25, 48, by = 4), Inf))) %>%
ggplot(mapping = aes(x = bin))
geom_histogram(stat = "count")
I got this approach from the ggplot
reference documentation. Use geom_bar()
and scale_x_binned()
.
- The bars look the way I want for discrete data (spaces between).
- I don't like the way the tick marks display between the bins, as if the data are continuous.
home_runs_df %>%
ggplot(aes(home_runs))
geom_bar()
scale_x_binned(n.breaks = 6)
I bet there are tricky ways to displace the labels to the position I want. Is this degree of customization necessary? I think what I'm trying to do is pretty conventional. What am I missing?
What I want is:
- Bin discrete, whole-number data.
- Plot the frequencies in the bins.
- Treat the integer data as discrete instead of continuous, with spaces between the bars and labels directly under each group.
CodePudding user response:
From the description of what you want to achieve I think that using geom_bar
and binning with ggplot2::cut_interval
or ggplot2::cut_width
would be the way to go to have bars with padding and a discrete scale at the same time.
Note: I also added a scale_x_discrete(drop = FALSE)
in case you want to show the intervals with no obs. too.
library(ggplot2)
ggplot(home_runs_df, aes(x = cut_interval(home_runs, 6)))
geom_bar()
scale_x_discrete(drop = FALSE)
ggplot(home_runs_df, aes(x = cut_width(home_runs, 4, boundary = 26)))
geom_bar()
scale_x_discrete(drop = FALSE)
CodePudding user response:
I think you can use your cut
tricks, but setting properly the labels
argument in the cut
function:
width <- 4
cutseq <- seq(25, 48, width)
home_runs_df %>%
mutate(bin = cut(home_runs,
breaks = c(cutseq, Inf),
labels = cutseq width/2) %>%
as.character()%>%
as.numeric()) %>%
ggplot(mapping = aes(x = bin))
geom_histogram(stat = "count",width = 3)
Therefore, each column is centered on the bin you choose, and you use continuous scale, that you can rescale as you want with scale_x_continuous
if you need