Home > Enterprise >  Simple histogram of two variables with ggplot
Simple histogram of two variables with ggplot


I'm using ggplot2 to do an histogram for two weight variables in my dataframe. The dataframe has two columns, a column with case name caso and a value column named peso. I have 3000 cases for each, and when I put the histograms side by side with facet_wrapoption they show correctly:

df |>
    pivot_longer(cols = c(peso,peso2), names_to = "caso", values_to = "peso") |>
    ggplot(aes(x = peso, colour= caso,  fill = caso)) 
      geom_histogram(alpha = 0.4)  


But when I try to overlap the two histograms on the same frame, the first one seems to have double cases number, the histograms are unequal in size:

df |>
    pivot_longer(cols = c(peso,peso2), names_to = "caso", values_to = "peso") |>
    ggplot(aes(x = peso, colour= caso,  fill = caso)) 
      geom_histogram(alpha = 0.4)

enter image description here

I don't know what I'm doing wrong. Any advice? Thanks in advance!


CodePudding user response:

The default of geom_histogram is to stack multiple series. The "identity" position scheme should fix this:

    df <- data.frame(peso = rnorm(1000, 250, 10),
                     peso2 = rnorm(1000, 260, 10))
    df %>% 
      pivot_longer(everything()) %>%
      ggplot(aes(x = value, fill = name))  
      geom_histogram(position = "identity", alpha = 0.5)
      # geom_histogram(position = position_identity(), alpha = 0.5) # alternate syntax

enter image description here

From the help for ?geom_histogram, it looks like the "Usage" section shows that as the default. FWIW, geom_freqpoly defaults to "identity."

  mapping = NULL,
  data = NULL,
  stat = "bin",
  position = "stack",   #### HERE
  binwidth = NULL,
  bins = NULL,
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE
  • Related