Home > other >  How to give back true or false based on multiple criteria in R?
How to give back true or false based on multiple criteria in R?

Time:10-08

I have a data.table with entry and exit dates per individual and a text column indicating the reason for the exit. My data looks like this:

dt <- data.table (ID = c(1,2,3,4,5),
                  entry = c("01/01/2010", "01/02/2016", "01/05/2010", "01/09/2013", "01/01/2010"),
                  exit = c("31/12/2010", "01/01/2021", "30/09/2010", "31/12/2015", "30/09/2010"),
                  text = c("a", NA, "c", NA, "b"),
                  result_2010 = c(NA, NA, NA, NA,NA))

   ID    entry      exit     text    result_2010
1:  1 01/01/2010 31/12/2010    a          NA
2:  2 01/02/2016 01/01/2021 <NA>          NA
3:  3 01/05/2010 30/09/2010    c          NA
4:  4 01/09/2013 31/12/2015 <NA>          NA
5:  5 01/01/2010 30/09/2010    b          NA

In the column "result_2010" I would like to decide if the individual has left the company between 01.01.2010 and 31.12.2010, but only if in column "text" the individual has either "a" or "c". Otherwise the result should return "false".

The result should look something like this:

    ID   entry       exit   text    result_2010
1:  1 01/01/2010 31/12/2010    a        TRUE
2:  2 01/02/2016 01/01/2021 <NA>       FALSE
3:  3 01/05/2010 30/09/2010    c        TRUE
4:  4 01/09/2013 31/12/2015 <NA>       FALSE
5:  5 01/01/2010 30/09/2010    b       FALSE

Does anyone has an idea how I could do this?

CodePudding user response:

data.table

dt[, c("entry","exit") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("entry","exit")]
dt[, result_2010 := text %in% c("a", "c") & between(exit, as.Date("2010-01-01"), as.Date("2010-12-31"))]
#       ID      entry       exit   text result_2010
#    <num>     <Date>     <Date> <char>      <lgcl>
# 1:     1 2010-01-01 2010-12-31      a        TRUE
# 2:     2 2016-02-01 2021-01-01   <NA>       FALSE
# 3:     3 2010-05-01 2010-09-30      c        TRUE
# 4:     4 2013-09-01 2015-12-31   <NA>       FALSE
# 5:     5 2010-01-01 2010-09-30      b       FALSE

(Effectively a data.table version of the other answer, and both benefit from the readability of data.table::between or dplyr::between.)

CodePudding user response:

We can convert the columns to Date class, and create a logical column based on the conditions in the OP's post

library(dplyr)
library(lubridate)
dt %>% 
   mutate(across(c(entry, exit), dmy)) %>% 
   mutate(result_2010 = entry >= as.Date('2010-01-01') & 
     exit <= as.Date("2010-12-31") & text %in% c("a", "c"))

-output

 ID      entry       exit text result_2010
1:  1 2010-01-01 2010-12-31    a        TRUE
2:  2 2016-02-01 2021-01-01 <NA>       FALSE
3:  3 2010-05-01 2010-09-30    c        TRUE
4:  4 2013-09-01 2015-12-31 <NA>       FALSE
5:  5 2010-01-01 2010-09-30    b       FALSE
  • Related