I am trying to get the daily mean and maximum values for each pollutant. For each of the monitoring stations, the daily mean and maximum values of each pollutant shall be calculated.
Here is a sample of my dataframe:
df <- data.frame (Station = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 10),
Date = c("31/12/2018", "31/12/2018", "30/12/2018", "1/12/2018", "4/3/2018", "16/3/2018", "16/3/2018", "5/2/2018", "31/12/2018", "6/2/2018", "6/2/2018", "4/3/2018"),
NO2 = c(40,55,52,58,76,98,12,14,23,54,53,76),
O3 = c(13,12,45,87,72,54,88,102,63,52,64,53))
This is the expected dataframe:
df2 <- data.frame (Station = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 10),
Date = c("31/12/2018", "30/12/2018", "1/12/2018", "4/3/2018", "16/3/2018", "16/3/2018", "5/2/2018", "31/12/2018", "6/2/2018", "4/3/2018"),
MAX_NO2 = c(55,52,58,76,98,12,14,23,54,76),
MEAN_NO2 = c(47.5,52,58,76,98,12,14,23,53.5,76),
MAX_O3 = c(13,45,87,72,54,88,102,63,64,53),
MEAN_O3 = c(12.5,45,87,72,54,88,102,63,58,53))
This is the code I found in S.O.. I tried that but an error occurred :
library(dplyr)
df %>%
group_by(Station,Date) %>%
mutate(max.O3 = max(O3), max.NO2 = max(NO2)) %>%
ungroup() %>%
mutate(max.O3 = case_when(O3 == max.O3 ~ max.O3, TRUE ~ NA_real_), max.NO2 = case_when(NO2 == max.NO2 ~ max.NO2, TRUE ~ NA_real_)) %>%
filter(!is.na(max.O3) | !is.na(max.NO2)) %>%
select(-O3,-NO2)
How can I get the desired output? I first thought of using a for loop for the iteration of values, but I don't know if this is the right thing to do. Something like:
for i in df$Station{
for date in df$date{
Max_NO2 = max(a)
Mean_NO2 = mean(a)
...
}
}
Thanks in advance!
CodePudding user response:
We can first convert to date format, then we can summarise
across
the 2 columns to get the mean and max for each group and column. Then, we can use the .names
argument to define the column names.
library(tidyverse)
df %>%
mutate(Date = as.Date(Date, "%d/%m/%Y")) %>%
group_by(Station, Date) %>%
summarise(across(NO2:O3, list(MAX = max, MEAN = mean), .names = "{.fn}_{.col}")) %>%
ungroup
Output
Station Date MAX_NO2 MEAN_NO2 MAX_O3 MEAN_O3
<dbl> <date> <dbl> <dbl> <dbl> <dbl>
1 1 2018-12-30 52 52 45 45
2 1 2018-12-31 55 47.5 13 12.5
3 2 2018-03-04 76 76 72 72
4 2 2018-03-16 98 98 54 54
5 2 2018-12-01 58 58 87 87
6 3 2018-02-05 14 14 102 102
7 3 2018-03-16 12 12 88 88
8 3 2018-12-31 23 23 63 63
9 4 2018-02-06 54 53.5 64 58
10 10 2018-03-04 76 76 53 53