I want to create a variable that returns "TRUE" for each observation that it makes every 30 days from the last existing observation that is "TRUE" for each ID.
The result is:
id date in
a 24/09/2020 TRUE
a 22/10/2020 FALSE
a 04/11/2020 TRUE
a 17/12/2020 TRUE
a 28/12/2020 FALSE
b 01/01/2020 TRUE
b 29/01/2020 FALSE
b 31/12/2020 TRUE
b 01/02/2020 TRUE
CodePudding user response:
Assuming your dataframe is called df
, just subtract the date
from its lag by group:
library(dyplr)
df %>%
mutate(
date = as.Date(date, format = "%d/%m/%Y")
) %>%
group_by(id) %>%
arrange(date, .by_group = TRUE) %>%
mutate(
lag_date = lag(date),
num_days = as.numeric(date - lag_date),
thirty_days = ifelse(num_days > 30, TRUE, FALSE)
) %>%
select(-lag_date)
Output:
# Groups: id [2]
id date `in` num_days thirty_days
<chr> <date> <lgl> <dbl> <lgl>
1 a 2020-09-24 TRUE NA NA
2 a 2020-10-22 FALSE 28 FALSE
3 a 2020-11-04 TRUE 13 FALSE
4 a 2020-12-17 TRUE 43 TRUE
5 a 2020-12-28 FALSE 11 FALSE
6 b 2020-01-01 TRUE NA NA
7 b 2020-01-29 FALSE 28 FALSE
8 b 2020-02-01 TRUE 3 FALSE
9 b 2020-12-31 TRUE 334 TRUE
Also it's not great to have a column called in
, that's a reserved word in R.
Edit: Fixed as realised data was not sorted by date.
CodePudding user response:
Using accumulate
:
library(tidyverse)
library(lubridate)
df %>%
group_by(id) %>%
mutate(date = dmy(date),
.in = accumulate(abs(diff(date)), .init = 30, .f = ~ ifelse(.x < 30, .x .y, .y)) >= 30)
id date in. .in
1 a 2020-09-24 TRUE TRUE
2 a 2020-10-22 FALSE FALSE
3 a 2020-11-04 TRUE TRUE
4 a 2020-12-17 TRUE TRUE
5 a 2020-12-28 FALSE FALSE
6 b 2020-01-01 TRUE TRUE
7 b 2020-01-29 FALSE FALSE
8 b 2020-12-31 TRUE TRUE
9 b 2020-02-01 TRUE TRUE
data
df <- read.table(header = T, text = "id date in
a 24/09/2020 TRUE
a 22/10/2020 FALSE
a 04/11/2020 TRUE
a 17/12/2020 TRUE
a 28/12/2020 FALSE
b 01/01/2020 TRUE
b 29/01/2020 FALSE
b 31/12/2020 TRUE
b 01/02/2020 TRUE")