I have a dataframe with daily meteorological data from a site and I want to summarize them in monthly averages but only if the number of NAs in each month is less than 10.
Data sample: exmpldf
DATE | TMAX |
---|---|
1955-06-01 | NA |
1955-06-02 | NA |
1955-06-03 | NA |
1955-06-04 | NA |
1955-06-05 | NA |
1955-06-06 | NA |
1955-06-07 | NA |
1955-06-08 | NA |
1955-06-09 | 20 |
1955-06-10 | 23 |
1955-06-11 | 23 |
1955-06-12 | 21.5 |
1955-06-13 | 24 |
1955-06-14 | 26.5 |
1955-06-15 | 27 |
1955-06-16 | 27 |
For the monthly aggregation I used mutate
from dplyr library to create a column "MONTH" and a column "YEAR"
exmpldf <- exmpldf %>%
mutate(month(DATE), year(DATE))
names(exmpldf) <- c("DATE", "TMAX", "MONTH", "YEAR")
DATE | TMAX | MONTH | YEAR |
---|---|---|---|
1955-06-01 | NA | 6 | 1955 |
1955-06-02 | NA | 6 | 1955 |
1955-06-03 | NA | 6 | 1955 |
1955-06-04 | NA | 6 | 1955 |
1955-06-05 | NA | 6 | 1955 |
1955-06-06 | NA | 6 | 1955 |
1955-06-07 | NA | 6 | 1955 |
1955-06-08 | NA | 6 | 1955 |
1955-06-09 | 20 | 6 | 1955 |
1955-06-10 | 23 | 6 | 1955 |
1955-06-11 | 23 | 6 | 1955 |
1955-06-12 | 21.5 | 6 | 1955 |
1955-06-13 | 24 | 6 | 1955 |
1955-06-14 | 26.5 | 6 | 1955 |
1955-06-15 | 27 | 6 | 1955 |
1955-06-16 | 27 | 6 | 1955 |
To create the monthly average I used summarize
, also from dplyr:
exmpldfmeanMonth <- exmpldf %>%
group_by(MONTH, YEAR) %>%
summarise(TMAX = mean(TMAX))
The problem is that there are many months that have at least 1 NA and others have 25 or 30 NAs, in any case, the mean result is NA. You could add na.rm = T
but then it calculates the mean even if there is only one data per month.
So I would like to generate a conditional that calculates the average only if there are 10 or less NAs per month, otherwise it should be considered as NA.
I tried without success:
exmpldfmeanMonth <- exmpldf %>%
group_by(MONTH, YEAR) %>%
summarise_if(is.na(exmpldf$TMAX)<=10, PRCP = mean(TMAX, na.rm = T), NA)
and:
exmpldfmeanMonth <- exmpldf %>%
group_by(MONTH, YEAR) %>%
summarise(PRCP = ifelse(is.na(exmpldf$TMAX)<=10, mean(TMAX, na.rm = T), NA))
Could you please guide me on how to solve this? Thank you very much in advance!
CodePudding user response:
Please, find one alternative based on your approach using the packages data.table
and lubridate
:
Reprex
- Code
library(data.table)
library(lubridate)
setDT(df1)[, DATE := as_date(DATE)][, c("month", "year") := c(lapply(.SD, month), lapply(.SD, year)), .SDcols = c("DATE")
][, .(PRCP = fifelse(sum(is.na(TMAX)) <= 10, mean(TMAX, na.rm = TRUE), NA_real_)), by = .(month, year)][]
- Case 1: NA <= 10
1.1 Your data:
df1 <- data.frame(DATE = c("1955-06-01", "1955-06-02", "1955-06-03", "1955-06-04",
"1955-06-05", "1955-06-06", "1955-06-07", "1955-06-08",
"1955-06-09", "1955-06-10", "1955-06-11", "1955-06-12",
"1955-06-13", "1955-06-14", "1955-06-15", "1955-06-16"),
TMAX = c(NA, NA, NA, NA, NA, NA, NA, NA, 20, 23, 23, 21.5, 24, 26.5,
27, 27))
2.2 Output:
#> month year PRCP
#> 1: 6 1955 24
- Case 2: NA > 10
2.1 Your data:
df1 <- data.frame(DATE = c("1955-06-01", "1955-06-02", "1955-06-03", "1955-06-04",
"1955-06-05", "1955-06-06", "1955-06-07", "1955-06-08",
"1955-06-09", "1955-06-10", "1955-06-11", "1955-06-12",
"1955-06-13", "1955-06-14", "1955-06-15", "1955-06-16"),
TMAX = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 21.5, 24, 26.5,
27, 27))
2.2 Output:
#> month year PRCP
#> 1: 6 1955 NA
Created on 2021-10-28 by the reprex package (v0.3.0)
CodePudding user response:
library(dplyr)
library(lubridate)
df %>%
mutate(month = month(DATE),
year = year(DATE)) %>%
group_by(month, year) %>%
summarize(prcp = if (sum(is.na(TMAX)) <= 10) mean(TMAX, na.rm = T) else NA,
.groups = "drop")
Alternatively, when you summarize
you could count the number of NA
and then add a mutate
statement to conditionally change prcp
:
df %>%
mutate(month = month(DATE),
year = year(DATE)) %>%
group_by(month, year) %>%
summarize(prcp = mean(TMAX, na.rm = T),
numna = sum(is.na(TMAX)), # count number of NA
.groups = "drop") %>%
mutate(prcp = ifelse(numna > 10, NA, prcp)) %>%
select(-numna)
Output
In the data you've shown there is only one month
and year
combination and that grouping has more than 10 NA
:
month year prcp
1 6 1955 NA