I have a dataframe that currently looks like this:
head(All_years_hazing_3)
Park Date Coy_season AC_code Aggr_Code Year
<chr> <date> <dbl> <chr> <chr> <dbl>
1 Crestmont 2018-09-25 3 3 1 2018
2 Crestmont 2018-09-26 3 4 0 2018
3 Crestmont 2019-10-20 3 3 0 2019
4 Nosehill Macewin 2018-09-26 3 3 0 2018
5 Nosehill Macewin 2018-09-27 3 3 1 2018
6 Country Hills G.C. - N~ 2018-09-28 3 1 0 2018
I would like to add a column (Last_AC_code) that would determine the last AC_code for each event (based on previous AC_code and Park), if events are less one month (31 days) apart. If events are more than a month apart, it would return N/A.
The resulting dataframe would therefore look like this:
Park Date Coy_season AC_code Aggr_Code Year Last_AC_code
<chr> <date> <dbl> <chr> <chr> <dbl>
1 Crestmont 2018-09-25 3 3 1 2018 NA
2 Crestmont 2018-09-26 3 4 0 2018 3
3 Crestmont 2019-10-20 3 3 0 2019 NA
4 Nosehill Macewin 2018-09-26 3 2 0 2018 NA
5 Nosehill Macewin 2018-09-27 3 3 1 2018 2
6 Country Hills G.C. - N~ 2018-09-28 3 1 0 2018 NA
I'm ideally looking for a dplyr solution, but am open to alternatives.
CodePudding user response:
Something like lag()
should help
library(tidyverse)
dataF <- data.frame(
Park = c("Crestmont", "Crestmont", "Crestmont", "Nosehill Macewin", "Nosehill Macewin", "Country Hills G.C."),
Date = c("2018-09-25", "2018-09-26", "2019-10-20", "2018-09-26", "2018-09-27", "2018-09-28"),
AC_Code = c(3, 4, 3, 3, 3, 1)
)
dataF |>
group_by(Park) |>
mutate(Last_AC_code = ifelse(as.Date(Date) - lag(as.Date(Date), 1) < 31, lag(AC_Code, 1), NA))
#> # A tibble: 6 × 4
#> # Groups: Park [3]
#> Park Date AC_Code Last_AC_code
#> <chr> <chr> <dbl> <dbl>
#> 1 Crestmont 2018-09-25 3 NA
#> 2 Crestmont 2018-09-26 4 3
#> 3 Crestmont 2019-10-20 3 NA
#> 4 Nosehill Macewin 2018-09-26 3 NA
#> 5 Nosehill Macewin 2018-09-27 3 3
#> 6 Country Hills G.C. 2018-09-28 1 NA
Created on 2022-03-03 by the reprex package (v2.0.1)
CodePudding user response:
Tidyverse
df %>%
group_by(Park) %>%
mutate(last_ac_cod = if_else((Date - lag(Date))<31, lag(AC_code), as.character(NA)))
data.table
df[, new_ac_code:= fifelse((Date-lag(Date))<31, shift(AC_code), as.character(NA)), by=.(Park)]
Output:
Park Date Coy_season AC_code Aggr_Code Year last_ac_cod
<chr> <date> <dbl> <chr> <chr> <dbl> <chr>
1 Crestmont 2018-09-25 3 3 1 2018 NA
2 Crestmont 2018-09-26 3 4 0 2018 3
3 Crestmont 2019-10-20 3 3 0 2019 NA
4 Nosehill Macewin 2018-09-26 3 2 0 2018 NA
5 Nosehill Macewin 2018-09-27 3 3 1 2018 2
6 Country Hills G.C. - N~ 2018-09-28 3 1 0 2018 NA