Create a variable that group by number of dates in R-CodePudding

I want to create a variable that returns "TRUE" for each observation that it makes every 30 days from the last existing observation that is "TRUE" for each ID.

The result is:

id  date    in
a   24/09/2020  TRUE
a   22/10/2020  FALSE
a   04/11/2020  TRUE
a   17/12/2020  TRUE
a   28/12/2020  FALSE
b   01/01/2020  TRUE
b   29/01/2020  FALSE
b   31/12/2020  TRUE
b   01/02/2020  TRUE

CodePudding user response：

Assuming your dataframe is called df, just subtract the date from its lag by group:

library(dyplr)
df %>%
    mutate(
        date = as.Date(date, format = "%d/%m/%Y")
    ) %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(
        lag_date = lag(date),
        num_days = as.numeric(date - lag_date),
        thirty_days = ifelse(num_days > 30, TRUE, FALSE)
    ) %>%
    select(-lag_date)

Output:

# Groups:   id [2]
  id    date       `in`  num_days thirty_days
  <chr> <date>     <lgl>    <dbl> <lgl>
1 a     2020-09-24 TRUE        NA NA
2 a     2020-10-22 FALSE       28 FALSE
3 a     2020-11-04 TRUE        13 FALSE
4 a     2020-12-17 TRUE        43 TRUE
5 a     2020-12-28 FALSE       11 FALSE
6 b     2020-01-01 TRUE        NA NA
7 b     2020-01-29 FALSE       28 FALSE
8 b     2020-02-01 TRUE         3 FALSE
9 b     2020-12-31 TRUE       334 TRUE

Also it's not great to have a column called in, that's a reserved word in R.

Edit: Fixed as realised data was not sorted by date.

CodePudding user response：

Using accumulate:

library(tidyverse)
library(lubridate)

df %>% 
  group_by(id) %>% 
  mutate(date = dmy(date),
         .in = accumulate(abs(diff(date)), .init = 30, .f = ~ ifelse(.x < 30, .x   .y, .y)) >= 30)


  id       date   in.   .in
1  a 2020-09-24  TRUE  TRUE
2  a 2020-10-22 FALSE FALSE
3  a 2020-11-04  TRUE  TRUE
4  a 2020-12-17  TRUE  TRUE
5  a 2020-12-28 FALSE FALSE
6  b 2020-01-01  TRUE  TRUE
7  b 2020-01-29 FALSE FALSE
8  b 2020-12-31  TRUE  TRUE
9  b 2020-02-01  TRUE  TRUE

data

df <- read.table(header = T, text = "id  date    in
a   24/09/2020  TRUE
a   22/10/2020  FALSE
a   04/11/2020  TRUE
a   17/12/2020  TRUE
a   28/12/2020  FALSE
b   01/01/2020  TRUE
b   29/01/2020  FALSE
b   31/12/2020  TRUE
b   01/02/2020  TRUE")