Home > other >  Plot histograms from data frame based on conditions as "group_by" style
Plot histograms from data frame based on conditions as "group_by" style

Time:04-07

I have a data frame that contains multiple columns. I am trying to plot some of these columns in function to others. First, here is an overview of the columns:

  1. This should be the x-axis on my histogram, it is the number of a day within a year (01 is 1st January, 34 is 3rd February, etc)

    head(df$day_number) [1] 302 60 314 83 92 32

  2. This is the decade which belongs to the day of the year you are looking at (there is a column that gives just the year)

    head(df$decade) [1] "2010-2019" "1980-1989" "1990-1999" "2000-2009" "2020-2029" "2000-2009"

  3. This is the latitude of each data that relates to one day

    head(df$lat) [1] 56 62 56 57 65 58

The output that I am looking for would be a series of histograms that represents in x the day_number, in y the density of data on this day (the number of data i.e. rows that has this precise day associated), grouped by decade and latitude degree.

As an example, I would have one histogram that represents this at lat = 55 for the decade = 1950-1959, then at lat = 55 for the decade = 1960-1969, then so on, and also lat = 56 for the decade = 1950-1959, then lat = 56 for the decade = 1960-1969, etc for all my latitudes and decades.

I managed to plot the total amount of data per day_number (all decades and all latitudes merged) using this code:

ggplot(df, aes(x=day_number))  
  geom_histogram(color="darkblue", fill="white", bins=366)  
  xlim(0,400)  
  xlab("Day n°")   ylab("Count")

I had in mind that something like this should work but it does not, and I do not manage to make my idea work:

ggplot(df, aes(x=day_number, group_by(lat = "55", decade = "1740-1749")))  
  geom_histogram()  
  xlim(0,400)  
  xlab("Day n°")   ylab("Count")

I hope that someone can help with this. Do not hesitate if you need more infos. Thank you very much.

CodePudding user response:

facet_wrap() can do that.

It looks like lat is currently a continuous variable. Stratifying by that would quickly get out of hand if there are many possible values, so I'd consider categorising it with cut() before passing it to facet_wrap().

library(dplyr)
library(ggplot2)
df |> 
        mutate(lat_grp = cut(lat, breaks = c(55, 60, 66))) |> 
        ggplot(aes(day_number))  
        geom_histogram(color="darkblue", fill="white", bins=366)  
        xlim(0,400)  
        xlab("Day n°")   ylab("Count")  
        facet_wrap(vars(decade, lat_grp))
  • Related