Home > Enterprise >  Simple histogram of two variables with ggplot
Simple histogram of two variables with ggplot

Time:09-24

I'm using ggplot2 to do an histogram for two weight variables in my dataframe. The dataframe has two columns, a column with case name caso and a value column named peso. I have 3000 cases for each, and when I put the histograms side by side with facet_wrapoption they show correctly:

df |>
    pivot_longer(cols = c(peso,peso2), names_to = "caso", values_to = "peso") |>
    ggplot(aes(x = peso, colour= caso,  fill = caso)) 
      geom_histogram(alpha = 0.4)  
      facet_wrap(~caso)

facet_wrap

But when I try to overlap the two histograms on the same frame, the first one seems to have double cases number, the histograms are unequal in size:

df |>
    pivot_longer(cols = c(peso,peso2), names_to = "caso", values_to = "peso") |>
    ggplot(aes(x = peso, colour= caso,  fill = caso)) 
      geom_histogram(alpha = 0.4)

enter image description here

I don't know what I'm doing wrong. Any advice? Thanks in advance!

Juan

CodePudding user response:

The default of geom_histogram is to stack multiple series. The "identity" position scheme should fix this:

    library(tidyverse)
    df <- data.frame(peso = rnorm(1000, 250, 10),
                     peso2 = rnorm(1000, 260, 10))
    
    df %>% 
      pivot_longer(everything()) %>%
      ggplot(aes(x = value, fill = name))  
      geom_histogram(position = "identity", alpha = 0.5)
      # geom_histogram(position = position_identity(), alpha = 0.5) # alternate syntax

enter image description here

From the help for ?geom_histogram, it looks like the "Usage" section shows that as the default. FWIW, geom_freqpoly defaults to "identity."

geom_histogram(
  mapping = NULL,
  data = NULL,
  stat = "bin",
  position = "stack",   #### HERE
  ...,
  binwidth = NULL,
  bins = NULL,
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE
)
  • Related