Home > Mobile >  In ggplot2, how do I properly scale x-axis in histogram?
In ggplot2, how do I properly scale x-axis in histogram?

Time:11-22

The Ask:

Please help me understand my conceptual error in the use of scale_x_binned() in ggplot2 as it relates to centering breaks beneath the appropriate bin in a geom_histogram().

Starting Example:

library(ggplot2)

df <- data.frame(hour = sample(seq(0,23), 150, replace = TRUE))

# The data is just the integer values of the 24-hour clock in a day.  It is 
#   **NOT** continuous data.

ggplot(df, aes(x = hour))  
  geom_histogram(bins = 24, fill = "grey60", color = "red")

This produces a histogram with labels properly centered beneath the bin for which it belongs, but I want to label each hour, 0 - 23.

To do that, I thought I would assign breaks using scale_x_binned() as demonstrated below.

Now I try to add the breaks:

ggplot(df, aes(x = hour))  
  geom_histogram(bins = 24, fill = "grey60", color = "red")  
  scale_x_binned(name = "Hour of Day",
               breaks = seq(0,23))
#> Warning: Removed 1 rows containing missing values (`geom_bar()`).

This returns the number of labels I wanted, but they are not centered beneath the bins as desired. I also get the warning message for missing values associated with geom_bar().

I believe I am overwriting the bins = 24 from the geom_histogram() call when I use the scale_x_binned() call afterward, but I don't understand exactly what is causing geom_histogram() to be centered in the first case that I am wrecking with my new call. I'd really like to have that clarified as I am not seeing my error when I read the associated help pages.

EDIT:

The "Starting Example" essentially works (bins are centered) except for the number of labels I ultimately want. If you built the ggplot2 layer differently, what is the equivalent code? That is, instead of:

ggplot(df, aes(x = hour))  
  geom_histogram(bins = 24, fill = "grey60", color = "red")

the call was instead built something like:

ggplot(df, aes(x = hour))  
  geom_histogram(fill = "grey60", color = "red")  
  scale_x_binned(n.breaks = 24)  # I know this isn't right, but akin to this.

or maybe

ggplot(df, aes(x = hour))  
   stat_bin(bins = 24, center = 0, fill = "grey60", color = "red")

CodePudding user response:

It sounds like you are looking to use non-default labeling, where you want the labels to be aligned to the midpoint of the bins instead of their boundaries, which is what the breaks define. We could do that by using a continuous scale and hiding the main breaks, but keeping the minor breaks, like below.

scale_x_binned does not have minor breaks. It only has breaks at the boundaries of the bins, so it's not obvious to me how you could place the break labels at the midpoints of the bins.

ggplot(df, aes(x = hour))  
  geom_histogram(bins = 24, fill = "grey60", color = "red")  
  scale_x_continuous(name = "Hour of Day", breaks = 0:23)  
  theme(axis.ticks = element_blank(),
        panel.grid.major.x = element_blank())

enter image description here

CodePudding user response:

I though the same as you, namely scale_x_discrete, but the data given to geom_histogram is assumed to be continuous, so ...

ggplot(df, aes(x = hour))  
   geom_histogram(bins = 24, fill = "grey60", color = "red")   
   scale_x_continuous(breaks = 0:23)

(Doesn't require any machinations with theme.)

enter image description here

I wish I could tell you that I found out how geom_histogram is centering the labels, but ggproto objects exist in a cavern with too many tunnels and passages for my mind to follow.

So I took a shot at examining the plot object that I created when I produced the png graphic above:

ggplot_build(plt)
# ------------
$data
$data[[1]]
    y count  x xmin xmax    density ncount ndensity flipped_aes PANEL group ymin ymax colour   fill size linetype
1   6     6  0 -0.5  0.5 0.04000000    0.6      0.6       FALSE     1    -1    0    6    red grey60  0.5        1
2   7     7  1  0.5  1.5 0.04666667    0.7      0.7       FALSE     1    -1    0    7    red grey60  0.5        1
3   4     4  2  1.5  2.5 0.02666667    0.4      0.4       FALSE     1    -1    0    4    red grey60  0.5        1
4   5     5  3  2.5  3.5 0.03333333    0.5      0.5       FALSE     1    -1    0    5    red grey60  0.5        1
5   7     7  4  3.5  4.5 0.04666667    0.7      0.7       FALSE     1    -1    0    7    red grey60  0.5        1
#snipped remainder

So the reason the break tick-marks are centered is that the bin construction is set up so they all are centered on the breaks.

Further exploration f whats in ggplot_build results:

ls(envir=ggplot_build(plt)$layout)
#[1] "coord"          "coord_params"   "facet"          "facet_params"   "layout"         "panel_params"  
#[7] "panel_scales_x" "panel_scales_y" "super"  

ggplot_build(plt)$layout$panel_params
#-------results
[[1]]
[[1]]$x
<ggproto object: Class ViewScale, gg>
    aesthetics: x xmin xmax xend xintercept xmin_final xmax_final xlower ...
    break_positions: function
    break_positions_minor: function
    breaks: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  ...
    continuous_range: -1.7 24.7
    dimension: function
    get_breaks: function
    get_breaks_minor: function
#---- snipped remaining outpu
  • Related