Home > front end >  Problem with grouping data for bar chart with multiple variables in ggplot2
Problem with grouping data for bar chart with multiple variables in ggplot2

Time:12-12

I have a dataframe, df:

df=data.frame("temp"=c(60.80,46.04,26.96,24.98),"humid"=c(93.79,53.33,50.34,54.65),"wind_speed"=c(40.27,39.12,14.96, 13.81), "date" =c("2013-01-01","2013-01-03","2013-02-01", "2013-02-02"))

df$date <- as.Date(df$date, "%Y-%m-%d")


  temp   humid    wind_speed      date

1 60.80  93.79     40.27          2013-01-01
2 46.04  53.33     39.12          2013-01-03
3 26.96  50.34     14.96          2013-02-01
4 24.98  54.65     13.81          2013-02-02

I have transformed it to look like this using this line:


df_mod<- cbind(df[4], stack(df_w_delays_mod[1:3]))

  metric      values          date

  temp        60.80          2013-01-01
  temp        46.04          2013-01-03
  temp        26.96          2013-02-01
  temp        24.98          2013-02-02
  humid       93.79          2013-01-01  
  humid       53.33          2013-01-03
  humid       50.34          2013-02-01
  humid       54.65          2013-02-02
  wind_speed  40.27          2013-01-01
  wind_speed  39.12          2013-01-03
  wind_speed  14.96          2013-02-01
  wind_speed  13.81          2013-02-02

then I have extracted the month with:

transform(df,  month = month(date, label=TRUE))

  metric      values         month

  temp        60.80          Jan
  temp        46.04          Jan
  temp        26.96          Feb
  temp        24.98          Feb
...

now I am trying to build a chart similar to this. enter image description here

I want to have the mean values for the height of each bar. So I want to group by month and by variable, and then take the average value within each month.

I'm trying this code but it gives me errors.

df_mod %>%
group_by(metric) %>% 
  summarize(mean= mean(values)) %>% 
ggplot(aes(fill=metric, y=mean, x=month))   
  geom_bar(position="dodge", stat="identity")  
  theme_bw() 
  labs(title="Weather metrics", 
       x="", y = "values")

Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error in `f()`:
! Aesthetics must be valid data columns. Problematic aesthetic(s): x = month. 
Did you mistype the name of a data column or forget to add after_stat()?
Run `rlang::last_error()` to see where the error occurred.

I have also tried group_by(month, metric) and it says

 `summarise()` has grouped output by '.groups'. You can override using the `.groups` argument.

Can someone help me with this?

CodePudding user response:

The issue is that you have to group_by metric and month. Otherwise you are only computing the mean per metric and as a result the month column gets dropped. That's the reason for the error you get as ggplot2 now thinks you want to map the month function on x.

Note: I also switched to the tidyverse way to reshape your data.

library(tidyr)
library(ggplot2)
library(lubridate)
library(dplyr)

df_mod <- df %>%
  tidyr::pivot_longer(-date, names_to = "metric", values_to = "values") %>%
  mutate(month = month(date, label = TRUE))

df_mod %>%
  group_by(month, metric) %>%
  summarize(mean = mean(values)) %>%
  ggplot(aes(fill = metric, y = mean, x = month))  
  geom_col(position = "dodge")  
  theme_bw()  
  labs(
    title = "Weather metrics",
    x = "", y = "values"
  )

enter image description here

  • Related