Home > Enterprise >  Summarize data between specified dates using dplyr R
Summarize data between specified dates using dplyr R

Time:08-27

I have weather data for summarize across different date intervals. Here is the weather data:

library(dplyr)

rainfall_data <- read.csv(text = "
date,rainfall_daily_mm
01/01/2019,0
01/02/2019,1
01/03/2019,3
01/04/2019,45
01/05/2019,0
01/06/2019,0
01/07/2019,0
01/08/2019,43
01/09/2019,5
01/10/2019,0
01/11/2019,55
01/12/2019,6
01/13/2019,0
01/14/2019,7
01/15/2019,0
01/16/2019,7
01/17/2019,8
01/18/2019,89
01/19/2019,65
01/20/2019,3
01/21/2019,0
01/22/2019,0
01/23/2019,2
01/24/2019,0
01/25/2019,0
01/26/2019,0
01/27/2019,0
01/28/2019,22
01/29/2019,3
01/30/2019,0
01/31/2019,0
") %>% 
  mutate(date = as.Date(date, format = "%d/%m/%Y"))

And here is the date intervals I need to get summaries of from the weather file:

intervals <- read.csv(text= "
treatment,initial,final
A,01/01/2019,01/05/2019
B,01/13/2019,01/20/2019
C,01/12/2019,01/26/2019
D,01/30/2019,01/31/2019
E,01/11/2019,01/23/2019
F,01/03/2019,01/19/2019
G,01/01/2019,01/24/2019
H,01/26/2019,01/28/2019
") %>%
  mutate(initial = as.Date(initial, format = "%d/%m/%Y"),
         final = as.Date(final, format = "%d/%m/%Y"))

The expected outcome is this one:

enter image description here

This is what I've tried based on a similar question:

summary_by_date_interval <- rainfall_data %>% 
  mutate(group = cumsum(grepl(intervals$initial|intervals$final, date))) %>%
  group_by(group) %>%
  summarise(rainfall = sum(rainfall_daily_mm))

And this is the error I got:

Error in `mutate()`:
! Problem while computing `group = cumsum(grepl(intervals$initial |
  intervals$final, date))`.
Caused by error in `Ops.Date()`:
! | not defined for "Date" objects
Run `rlang::last_error()` to see where the error occurred.

Any help will be really appreciated.

CodePudding user response:

First %d/%m/%Y need to be %m/%d/%Y (or you'll have wrong dates and many NA's).

Then you could e.g. use lubridates interval and %within%:

library(dplyr)
library(lubridate)

intervals |>
  group_by(treatment) |>
  mutate(test = sum(rainfall_data$rainfall_daily_mm[rainfall_data$date %within% interval(initial, final)])) |>
  ungroup()

Output:

# A tibble: 8 × 4
  treatment initial    final       test
  <chr>     <date>     <date>     <int>
1 A         2019-01-01 2019-01-05    49
2 B         2019-01-13 2019-01-20   179
3 C         2019-01-12 2019-01-26   187
4 D         2019-01-30 2019-01-31     0
5 E         2019-01-11 2019-01-23   242
6 F         2019-01-03 2019-01-19   333
7 G         2019-01-01 2019-01-24   339
8 H         2019-01-26 2019-01-28    22
  • Related