Home > Software engineering >  Determine the type of the last event
Determine the type of the last event

Time:03-04

I have a dataframe that currently looks like this:

head(All_years_hazing_3)

  Park                    Date       Coy_season AC_code Aggr_Code  Year
  <chr>                   <date>          <dbl> <chr>   <chr>     <dbl>
1 Crestmont               2018-09-25          3 3       1          2018
2 Crestmont               2018-09-26          3 4       0          2018
3 Crestmont               2019-10-20          3 3       0          2019
4 Nosehill Macewin        2018-09-26          3 3       0          2018
5 Nosehill Macewin        2018-09-27          3 3       1          2018
6 Country Hills G.C. - N~ 2018-09-28          3 1       0          2018

I would like to add a column (Last_AC_code) that would determine the last AC_code for each event (based on previous AC_code and Park), if events are less one month (31 days) apart. If events are more than a month apart, it would return N/A.

The resulting dataframe would therefore look like this:

  Park                    Date       Coy_season AC_code Aggr_Code  Year  Last_AC_code
  <chr>                   <date>          <dbl> <chr>   <chr>     <dbl>            
1 Crestmont               2018-09-25          3 3       1          2018            NA
2 Crestmont               2018-09-26          3 4       0          2018             3
3 Crestmont               2019-10-20          3 3       0          2019            NA 
4 Nosehill Macewin        2018-09-26          3 2       0          2018            NA 
5 Nosehill Macewin        2018-09-27          3 3       1          2018             2
6 Country Hills G.C. - N~ 2018-09-28          3 1       0          2018            NA   

I'm ideally looking for a dplyr solution, but am open to alternatives.

CodePudding user response:

Something like lag() should help

library(tidyverse)
dataF <- data.frame(
Park  = c("Crestmont", "Crestmont", "Crestmont", "Nosehill Macewin", "Nosehill Macewin", "Country Hills G.C."),
Date  = c("2018-09-25", "2018-09-26", "2019-10-20", "2018-09-26", "2018-09-27", "2018-09-28"),
AC_Code = c(3, 4, 3, 3, 3, 1)
)
  
dataF |> 
  group_by(Park) |>
  mutate(Last_AC_code = ifelse(as.Date(Date) - lag(as.Date(Date), 1) < 31, lag(AC_Code, 1), NA))
#> # A tibble: 6 × 4
#> # Groups:   Park [3]
#>   Park               Date       AC_Code Last_AC_code
#>   <chr>              <chr>        <dbl>        <dbl>
#> 1 Crestmont          2018-09-25       3           NA
#> 2 Crestmont          2018-09-26       4            3
#> 3 Crestmont          2019-10-20       3           NA
#> 4 Nosehill Macewin   2018-09-26       3           NA
#> 5 Nosehill Macewin   2018-09-27       3            3
#> 6 Country Hills G.C. 2018-09-28       1           NA

Created on 2022-03-03 by the reprex package (v2.0.1)

CodePudding user response:

Tidyverse

df %>%
  group_by(Park) %>% 
  mutate(last_ac_cod = if_else((Date - lag(Date))<31, lag(AC_code), as.character(NA)))

data.table

df[, new_ac_code:= fifelse((Date-lag(Date))<31, shift(AC_code), as.character(NA)), by=.(Park)]

Output:

 Park                    Date       Coy_season AC_code Aggr_Code  Year last_ac_cod
  <chr>                   <date>          <dbl> <chr>   <chr>     <dbl> <chr>      
1 Crestmont               2018-09-25          3 3       1          2018 NA         
2 Crestmont               2018-09-26          3 4       0          2018 3          
3 Crestmont               2019-10-20          3 3       0          2019 NA         
4 Nosehill Macewin        2018-09-26          3 2       0          2018 NA         
5 Nosehill Macewin        2018-09-27          3 3       1          2018 2          
6 Country Hills G.C. - N~ 2018-09-28          3 1       0          2018 NA       
  •  Tags:  
  • r
  • Related