Home > Software engineering >  Create flag columns from elements of list
Create flag columns from elements of list

Time:10-27

I have a list of dates and want to create a column in an existing data frame for every element of the list and in that column, I want a 1 if the date is between the start and end date.

input:
my_date = as.Date(c("2022-10-02","2022-10-31","2022-10-05"))
                  
df1 <- data.frame(ID = c(1, 2, 3),
                  start_date = as.Date(c("2022-10-01","2022-10-08","2022-10-01")),
                  end_date = as.Date(c("2022-10-22","2022-10-22","2022-10-08")))

output:
df1 <- data.frame(ID = c(1, 2, 3),
                  start_date = as.Date(c("2022-10-01","2022-10-08","2022-10-01")),
                  end_date = as.Date(c("2022-10-22","2022-10-22","2022-10-08")),
                  flag_1= c(1,0,1), flag_2=c(0,0,0), flag_3= c(1,0,1))

I could do it manually, but I have to many rows. I was looking into a solution with mutate but not sure how to implement.

CodePudding user response:

Do you want something like this? I assume you want to compare each element in my_date to the respective row in df1 and not every row.

library(tidyverse)

df1 |>
  mutate(flag = pmap_dbl(list(start_date, end_date, my_date),
                         ~` `(between(..3, ..1, ..2))) )
#>   ID start_date   end_date flag
#> 1  1 2022-10-01 2022-10-22    1
#> 2  2 2022-10-08 2022-10-22    0
#> 3  3 2022-10-01 2022-10-08    1

EDIT from clarification

df1 |>
  mutate(flag = map2(start_date, end_date, 
                     ~pmap_dbl(list(.x, .y, my_date),
                               ~` `(between(..3, ..1, ..2))))) |>
  unnest_wider(flag, names_sep = "_")
#> # A tibble: 3 x 6
#>      ID start_date end_date   flag_1 flag_2 flag_3
#>   <dbl> <date>     <date>      <dbl>  <dbl>  <dbl>
#> 1     1 2022-10-01 2022-10-22      1      0      1
#> 2     2 2022-10-08 2022-10-22      0      0      0
#> 3     3 2022-10-01 2022-10-08      1      0      1

CodePudding user response:

Using base:

cbind(df1, 
      lapply(setNames(my_date, paste0("flag_", seq_along(my_date))),
             function(i) as.integer(df1$start_date <= i & df1$end_date >= i)))

#   ID start_date   end_date flag_1 flag_2 flag_3
# 1  1 2022-10-01 2022-10-22      1      0      1
# 2  2 2022-10-08 2022-10-22      0      0      0
# 3  3 2022-10-01 2022-10-08      1      0      1
  • Related