Home > Back-end >  R adding a column to one dataframe based on another dataframe and the date
R adding a column to one dataframe based on another dataframe and the date

Time:03-08

I have a dataframe (Reports_following_AC) where each row represents a report. This dataframe looks like this:

> head(Reports_following_AC)

 Park               Month      Obs_con Coy_Season Number_AC Number_4w_AC
  <chr>              <date>       <dbl>      <dbl>     <int>        <int>
1 14st NE - Coventry 2019-06-14       1          2         8            0
2 14st NE - Coventry 2019-10-12       0          3        10            0
3 14st NE - Coventry 2019-10-13       0          3        10            0
4 14st NE - Coventry 2021-06-23       1          2        10            0
5 Airways Park       2020-07-05       0          2         3            0
6 Airways Park       2021-07-18       1          2         6            0

I would like to add a column to my Reports_following_AC dataframe, "Last_treatment", based on the "AC_code" column of the Reaction_per_park_per_day_3 dataframe (below). In my Reaction_per_park_per_day_3 dataframe, each row represents an AC event.

The Last_treatment column that would be added to the Reports_following_AC dataframe would represent the "AC_code" (treatment) of the last AC event prior to a report in a Park, if that AC event was done in the 4 weeks (28 days) prior to a report.

> head(Reaction_per_park_per_day_3)
# A tibble: 6 x 10
Park                 Date      AC_code 
<chr>                <date>       <dbl>
1 14st NE - Coventry 2019-06-05       6 
2 14st NE - Coventry 2019-07-12       7
3 14st NE - Coventry 2019-10-05       1
4 14st NE - Coventry 2021-06-18       2 
5 Airways Park       2020-06-26       1
6 Airways Park       2021-06-30       5

The resulting dataframe would therefore look like this:

 Park               Month      Obs_con Coy_Season Number_AC Number_4w_AC  Last_treatment
  <chr>              <date>       <dbl>      <dbl>     <int>        <int>          <dbl>
1 14st NE - Coventry 2019-06-14       1          2         8            0              6 
2 14st NE - Coventry 2019-10-12       0          3        10            0              1  
3 14st NE - Coventry 2019-10-13       0          3        10            0              1
4 14st NE - Coventry 2021-06-23       1          2        10            0             NA
5 Airways Park       2020-07-05       0          2         3            0              1
6 Airways Park       2021-07-18       1          2         6            0              5

I tried the following code, but it's not quite working because instead of providing the AC_Code for the last AC event prior to the reports if within 30 days of the report, it provides the AC_code for all the AC events within 30 days of the report.

Reports_following_AC_1 <- Reports_following_AC %>%
  left_join(select(Reaction_per_park_per_day_3, c(Park, Date, AC_code))) %>%
              filter(Date <= Month ) %>%
              group_by(Park, Month, Obs_con, Coy_Season) %>%
              mutate(Last_treatment = if_else((Month - max(Date))<28, AC_code, as.character(NA))) %>%
              distinct

> head(Reports_following_AC_1)

  Park               Month      Obs_con Coy_Season Number_AC Number_4w_AC Date       AC_code Last_treatment
  <chr>              <date>       <dbl>      <dbl>     <int>        <int> <date>     <chr>   <chr>         
1 14st NE - Coventry 2019-06-14       1          2         8            0 2019-01-30 3       NA            
2 14st NE - Coventry 2019-06-14       1          2         8            0 2019-01-30 4       NA            
3 14st NE - Coventry 2019-06-14       1          2         8            0 2019-01-30 1       NA            
4 14st NE - Coventry 2019-06-14       1          2         8            0 2019-02-01 4       NA            
5 14st NE - Coventry 2019-06-14       1          2         8            0 2019-02-01 2       NA            
6 14st NE - Coventry 2019-06-14       1          2         8            0 2019-02-04 1       NA

I'm ideally looking for a dplyr solution, but I'm open to other possibilities.

CodePudding user response:

you want to join with a selection of columns from Reaction_per_park_per_day_3 if i understand correctly? This should work:

Reports_following_AC_1 <- Reports_following_AC %>%
  left_join(select(Reaction_per_park_per_day_3, c(Park,Month,AC_cod), by="Park" ) %>%
  filter(Date <= Month ) %>%
  group_by(Park, Month, Obs_con, Coy_Season) %>%
  mutate(Last_treatment = if_else((Month - max(Date))<28, lag(AC_code), as.character(NA))) %>%
  distinct

CodePudding user response:

I figured it out!

Reports_following_AC_1 <- Reports_following_AC %>%
  left_join(select(Reaction_per_park_per_day_3, c(Park, Date, AC_code))) %>%
              filter(Date < Month ) %>%
              group_by(Park, Month, Obs_con, Coy_Season, Number_4w_AC) %>%
              mutate(Last_treatment = last(if_else((Month - max(Date))<28, AC_code, as.character(NA)))) %>%
  select(c(Park, Month, Obs_con, Coy_Season, Number_4w_AC, Last_treatment)) %>%
              distinct
  •  Tags:  
  • r
  • Related