Home > Software engineering >  R: How to flag observations within a certain timeframe in data.table?
R: How to flag observations within a certain timeframe in data.table?

Time:12-02

I'm working with a large data frame similar to the one below. I'd like to flag all observations that have an observation 30 days earlier by ID. I had originally been trying to do a fuzzyjoin to achieve this, but can't seem to nail down where I'm going wrong with {data.table}. Any tips?

library(tidyverse)
library(magrittr)
library(data.table)
df<-tibble(
  date=sample(seq(as.Date('1999/01/01'), as.Date('1999/06/01'), by="day"), 300,replace=T),
  id=sample(seq(1:3),300,replace=T),claim_id=1:300)

df%<>%data.table()
df_index<-df
df_readmit<-df
names(df_index)[c(1,3)]<-c("index_date","index_id")
names(df_readmit)[c(1,3)]<-c("readmit_date","readmit_id")

df_readmit[df_index,.(id,index_date,readmit_date,index_id,readmit_id),
           on=.(id,readmit_date>index_date),nomatch=0]

CodePudding user response:

If order can be changed, then I suggest we just look at the diff of the dates.

library(data.table)
setorder(df, date)
df[,.SD[c(TRUE, diff(date) > 30),], by = id]
#       id       date claim_id
#    <int>     <Date>    <int>
# 1:     1 1999-01-01      231
# 2:     2 1999-01-02      284
# 3:     3 1999-01-03       78

In this case, because 100 days spanning 6 months is very unlikely to have a 30-day span untouched, none of the sample data has such an occurrence. However, perhaps the method works for you with your real data.

  • Related