Home > front end >  Place elements from vector on histogram bins (R ggplot)
Place elements from vector on histogram bins (R ggplot)

Time:08-13

I have a ggplot histogram, showing two histograms of a continuous variable, one for each level of a group.

Through use of ggplot_build, I now also have vectors where each element is the proportional count of one group (1) versus the other (0), per bin.

So for the following histogram built with

ggplot(data,aes(x=nonfordist))   geom_histogram(aes(fill=presence),
position="identity",alpha=0.5,bins=30)  coord_cartesian(xlim=c(NA,1750))

enter image description here

I have the following list, showing sequential proportions of group1/group0 per bin

list(0.398927744608261, 0.35358629130967, 0.275296034083078, 
    0.247361252979231, 0.260224274406332, 0.22107969151671, 0.252847380410023, 
    0.230055658627087, 0.212244897959184, 0.242105263157895, 
    0.235294117647059, 0.115384615384615, 0.2, 0.421052631578947, 
    0.4375, 0.230769230769231, 0.222222222222222, 0.5, 0, 0, 
    0, NaN, 1, 1, 0, 0, NaN, NaN, NaN, Inf)

What I want now is to plot the elements of this list on the corresponding bins, preferably above the bars showing the counts for group1.

I do not want to include the proportions for bins that fall outside of the histogram due to my xlim command.

CodePudding user response:

You could use stat_bin with a text geom, using the same breaks as you do for your histogram. We don't have your actual data, so I've tried to approximate it here (see footnote for reproducible data). You haven't told us what your list of proportions is called, so I have named it props in this example.

ggplot(data,aes(x=nonfordist))   
  geom_histogram(aes(fill = presence),
                 breaks = seq(-82.5, by = 165, length = 11),
                 position = "identity", alpha = 0.5, bins = 30)   
  stat_bin(data = data[data$presence == 1, ], geom = "text",
             breaks = seq(-82.5, by = 165, length = 11),
           label = round(unlist(props)[1:10], 2), vjust = -0.5)  
  coord_cartesian(xlim = c(NA, 1750))

enter image description here


Approximation of data

data <- data.frame(
  nonfordist = rep(165 * c(0:10, 0:10),
                   c(24800, 20200, 16000, 6000, 2800, 1300, 700, 450, 100,
                     50, 30, 9950, 7400, 4500, 600, 300, 150, 80, 50, 30, 20,
                     10)),
  presence = factor(rep(c(0, 1), c(72430,  23090))))
  • Related