I have a data.table with entry and exit dates per individual and a text column indicating the reason for the exit. My data looks like this:
dt <- data.table (ID = c(1,2,3,4,5),
entry = c("01/01/2010", "01/02/2016", "01/05/2010", "01/09/2013", "01/01/2010"),
exit = c("31/12/2010", "01/01/2021", "30/09/2010", "31/12/2015", "30/09/2010"),
text = c("a", NA, "c", NA, "b"),
result_2010 = c(NA, NA, NA, NA,NA))
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a NA
2: 2 01/02/2016 01/01/2021 <NA> NA
3: 3 01/05/2010 30/09/2010 c NA
4: 4 01/09/2013 31/12/2015 <NA> NA
5: 5 01/01/2010 30/09/2010 b NA
In the column "result_2010" I would like to decide if the individual has left the company between 01.01.2010 and 31.12.2010, but only if in column "text" the individual has either "a" or "c". Otherwise the result should return "false".
The result should look something like this:
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a TRUE
2: 2 01/02/2016 01/01/2021 <NA> FALSE
3: 3 01/05/2010 30/09/2010 c TRUE
4: 4 01/09/2013 31/12/2015 <NA> FALSE
5: 5 01/01/2010 30/09/2010 b FALSE
Does anyone has an idea how I could do this?
CodePudding user response:
data.table
dt[, c("entry","exit") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("entry","exit")]
dt[, result_2010 := text %in% c("a", "c") & between(exit, as.Date("2010-01-01"), as.Date("2010-12-31"))]
# ID entry exit text result_2010
# <num> <Date> <Date> <char> <lgcl>
# 1: 1 2010-01-01 2010-12-31 a TRUE
# 2: 2 2016-02-01 2021-01-01 <NA> FALSE
# 3: 3 2010-05-01 2010-09-30 c TRUE
# 4: 4 2013-09-01 2015-12-31 <NA> FALSE
# 5: 5 2010-01-01 2010-09-30 b FALSE
(Effectively a data.table
version of the other answer, and both benefit from the readability of data.table::between
or dplyr::between
.)
CodePudding user response:
We can convert the columns to Date
class, and create a logical column based on the conditions in the OP's post
library(dplyr)
library(lubridate)
dt %>%
mutate(across(c(entry, exit), dmy)) %>%
mutate(result_2010 = entry >= as.Date('2010-01-01') &
exit <= as.Date("2010-12-31") & text %in% c("a", "c"))
-output
ID entry exit text result_2010
1: 1 2010-01-01 2010-12-31 a TRUE
2: 2 2016-02-01 2021-01-01 <NA> FALSE
3: 3 2010-05-01 2010-09-30 c TRUE
4: 4 2013-09-01 2015-12-31 <NA> FALSE
5: 5 2010-01-01 2010-09-30 b FALSE