Home > Blockchain >  Find if any of the dates in a column in a long format data falls in an interval
Find if any of the dates in a column in a long format data falls in an interval

Time:10-17

I would like to create a new column test that equals to 1, if "any" of the dates in the visit column (per ID) falls between the [a,b) interval.

    ID      birth gender      visit          a          b
1  104 2011-05-28      M 2021-05-04 2016-05-28 2018-05-28
2  104 2011-05-28      M 2018-05-14 2016-05-28 2018-05-28
3  104 2011-05-28      M 2021-06-14 2016-05-28 2018-05-28
4  104 2011-05-28      M 2019-06-17 2016-05-28 2018-05-28
5  104 2011-05-28      M 2017-05-18 2016-05-28 2018-05-28
6  104 2011-05-28      M 2020-11-19 2016-05-28 2018-05-28
7  104 2011-05-28      M 2013-10-22 2016-05-28 2018-05-28
8  104 2011-05-28      M 2021-11-24 2016-05-28 2018-05-28
9  105 2013-04-14      W 2018-06-19 2018-04-14 2020-04-14
10 105 2013-04-14      W 2019-05-27 2018-04-14 2020-04-14
11 106 2010-10-08      M 2017-10-02 2015-10-08 2017-10-08
12 106 2010-10-08      M 2015-05-05 2015-10-08 2017-10-08
13 106 2010-10-08      M 2020-11-05 2015-10-08 2017-10-08
14 106 2010-10-08      M 2021-07-06 2015-10-08 2017-10-08
15 106 2010-10-08      M 2015-01-08 2015-10-08 2017-10-08
16 106 2010-10-08      M 2018-11-08 2015-10-08 2017-10-08
17 106 2010-10-08      M 2021-06-09 2015-10-08 2017-10-08
18 106 2010-10-08      M 2020-02-11 2015-10-08 2017-10-08
19 106 2010-10-08      M 2018-06-11 2015-10-08 2017-10-08
20 106 2010-10-08      M 2019-09-11 2015-10-08 2017-10-08
21 106 2010-10-08      M 2017-04-12 2015-10-08 2017-10-08
22 106 2010-10-08      M 2017-10-12 2015-10-08 2017-10-08
23 106 2010-10-08      M 2018-12-13 2015-10-08 2017-10-08
24 106 2010-10-08      M 2017-03-14 2015-10-08 2017-10-08
25 106 2010-10-08      M 2018-11-16 2015-10-08 2017-10-08
26 106 2010-10-08      M 2013-05-17 2015-10-08 2017-10-08
27 106 2010-10-08      M 2014-06-17 2015-10-08 2017-10-08
28 106 2010-10-08      M 2016-03-18 2015-10-08 2017-10-08
29 106 2010-10-08      M 2017-03-23 2015-10-08 2017-10-08
30 106 2010-10-08      M 2021-11-25 2015-10-08 2017-10-08
31 106 2010-10-08      M 2016-01-26 2015-10-08 2017-10-08
32 106 2010-10-08      M 2014-05-27 2015-10-08 2017-10-08
33 106 2010-10-08      M 2012-11-28 2015-10-08 2017-10-08
34 106 2010-10-08      M 2019-01-30 2015-10-08 2017-10-08
35 106 2010-10-08      M 2017-03-31 2015-10-08 2017-10-08
36 107 2016-06-30      M 2021-11-01 2021-06-30 2023-06-30
37 107 2016-06-30      M 2018-06-08 2021-06-30 2023-06-30
38 107 2016-06-30      M 2022-06-08 2021-06-30 2023-06-30
39 107 2016-06-30      M 2021-10-08 2021-06-30 2023-06-30
40 107 2016-06-30      M 2020-10-09 2021-06-30 2023-06-30
41 107 2016-06-30      M 2020-11-09 2021-06-30 2023-06-30
42 107 2016-06-30      M 2021-11-12 2021-06-30 2023-06-30
43 107 2016-06-30      M 2020-01-13 2021-06-30 2023-06-30
44 107 2016-06-30      M 2022-07-20 2021-06-30 2023-06-30
45 107 2016-06-30      M 2021-10-20 2021-06-30 2023-06-30
46 107 2016-06-30      M 2021-01-22 2021-06-30 2023-06-30
47 107 2016-06-30      M 2021-06-23 2021-06-30 2023-06-30
48 107 2016-06-30      M 2021-10-25 2021-06-30 2023-06-30
49 107 2016-06-30      M 2021-10-29 2021-06-30 2023-06-30
50 107 2016-06-30      M 2018-11-30 2021-06-30 2023-06-30

I tried:

df <- df%>%
group_by(ID)%>%
mutate(test = ifelse(df$visit %within% as.interval(df$a,df$b),1,0))

which is obviously wrong.

CodePudding user response:

For sake of readability, take the extra effort and define your interval manually. This is easy using ave.

transform(dat, test=as.numeric(ave(visit >= a & visit < b, ID, FUN=any)))
#    ID      visit          a          b test
# 1 104 3021-05-04 2016-05-28 2018-05-28    1
# 2 104 2018-05-14 2016-05-28 2018-05-28    1
# 3 104 2021-06-14 2016-05-28 2018-05-28    1
# 4 105 3019-06-17 2016-05-28 2018-05-28    0
# 5 105 3017-05-18 2016-05-28 2018-05-28    0
# 6 105 3020-11-19 2016-05-28 2018-05-28    0 

Data:

dat <- structure(list(ID = c(104L, 104L, 104L, 105L, 105L, 105L), visit = c("3021-05-04", 
"2018-05-14", "2021-06-14", "3019-06-17", "3017-05-18", "3020-11-19"
), a = c("2016-05-28", "2016-05-28", "2016-05-28", "2016-05-28", 
"2016-05-28", "2016-05-28"), b = c("2018-05-28", "2018-05-28", 
"2018-05-28", "2018-05-28", "2018-05-28", "2018-05-28")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

CodePudding user response:

You can use any >= <. coerces the logical vector to a numeric one with TRUE = 1 and FALSE = 0.

df %>% 
  group_by(ID) %>% 
  mutate(test =  any(visit >= a & visit < b))

Also, note that you should always provide a minimal and reproducible example. Here, you don't provide the code to reproduce your data set nor is it minimal. You could have chosen two rows per group and 2 groups for instance. Check here for more.

  • Related