I have a dataframe (Reports_following_AC) where each row represents a report. This dataframe looks like this:
> head(Reports_following_AC)
Park Month Obs_con Coy_Season Number_AC Number_4w_AC
<chr> <date> <dbl> <dbl> <int> <int>
1 14st NE - Coventry 2019-06-14 1 2 8 0
2 14st NE - Coventry 2019-10-12 0 3 10 0
3 14st NE - Coventry 2019-10-13 0 3 10 0
4 14st NE - Coventry 2021-06-23 1 2 10 0
5 Airways Park 2020-07-05 0 2 3 0
6 Airways Park 2021-07-18 1 2 6 0
I would like to add a column to my Reports_following_AC dataframe, "Last_treatment", based on the "AC_code" column of the Reaction_per_park_per_day_3 dataframe (below). In my Reaction_per_park_per_day_3 dataframe, each row represents an AC event.
The Last_treatment column that would be added to the Reports_following_AC dataframe would represent the "AC_code" (treatment) of the last AC event prior to a report in a Park, if that AC event was done in the 4 weeks (28 days) prior to a report.
> head(Reaction_per_park_per_day_3)
# A tibble: 6 x 10
Park Date AC_code
<chr> <date> <dbl>
1 14st NE - Coventry 2019-06-05 6
2 14st NE - Coventry 2019-07-12 7
3 14st NE - Coventry 2019-10-05 1
4 14st NE - Coventry 2021-06-18 2
5 Airways Park 2020-06-26 1
6 Airways Park 2021-06-30 5
The resulting dataframe would therefore look like this:
Park Month Obs_con Coy_Season Number_AC Number_4w_AC Last_treatment
<chr> <date> <dbl> <dbl> <int> <int> <dbl>
1 14st NE - Coventry 2019-06-14 1 2 8 0 6
2 14st NE - Coventry 2019-10-12 0 3 10 0 1
3 14st NE - Coventry 2019-10-13 0 3 10 0 1
4 14st NE - Coventry 2021-06-23 1 2 10 0 NA
5 Airways Park 2020-07-05 0 2 3 0 1
6 Airways Park 2021-07-18 1 2 6 0 5
I tried the following code, but it's not quite working because instead of providing the AC_Code for the last AC event prior to the reports if within 30 days of the report, it provides the AC_code for all the AC events within 30 days of the report.
Reports_following_AC_1 <- Reports_following_AC %>%
left_join(select(Reaction_per_park_per_day_3, c(Park, Date, AC_code))) %>%
filter(Date <= Month ) %>%
group_by(Park, Month, Obs_con, Coy_Season) %>%
mutate(Last_treatment = if_else((Month - max(Date))<28, AC_code, as.character(NA))) %>%
distinct
> head(Reports_following_AC_1)
Park Month Obs_con Coy_Season Number_AC Number_4w_AC Date AC_code Last_treatment
<chr> <date> <dbl> <dbl> <int> <int> <date> <chr> <chr>
1 14st NE - Coventry 2019-06-14 1 2 8 0 2019-01-30 3 NA
2 14st NE - Coventry 2019-06-14 1 2 8 0 2019-01-30 4 NA
3 14st NE - Coventry 2019-06-14 1 2 8 0 2019-01-30 1 NA
4 14st NE - Coventry 2019-06-14 1 2 8 0 2019-02-01 4 NA
5 14st NE - Coventry 2019-06-14 1 2 8 0 2019-02-01 2 NA
6 14st NE - Coventry 2019-06-14 1 2 8 0 2019-02-04 1 NA
I'm ideally looking for a dplyr solution, but I'm open to other possibilities.
CodePudding user response:
you want to join with a selection of columns from Reaction_per_park_per_day_3 if i understand correctly? This should work:
Reports_following_AC_1 <- Reports_following_AC %>%
left_join(select(Reaction_per_park_per_day_3, c(Park,Month,AC_cod), by="Park" ) %>%
filter(Date <= Month ) %>%
group_by(Park, Month, Obs_con, Coy_Season) %>%
mutate(Last_treatment = if_else((Month - max(Date))<28, lag(AC_code), as.character(NA))) %>%
distinct
CodePudding user response:
I figured it out!
Reports_following_AC_1 <- Reports_following_AC %>%
left_join(select(Reaction_per_park_per_day_3, c(Park, Date, AC_code))) %>%
filter(Date < Month ) %>%
group_by(Park, Month, Obs_con, Coy_Season, Number_4w_AC) %>%
mutate(Last_treatment = last(if_else((Month - max(Date))<28, AC_code, as.character(NA)))) %>%
select(c(Park, Month, Obs_con, Coy_Season, Number_4w_AC, Last_treatment)) %>%
distinct