Home > Mobile >  In R, create a variable using counts of events in a date range
In R, create a variable using counts of events in a date range

Time:02-28

Background

I've got an R dataframe d:

d <- data.frame(ID = c("a","a","b","b", "c","c","c"),
                event = c(1,1,0,0,1,1,1),
                event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2013-04-07","2014-07-14")),
                stringsAsFactors=FALSE)

As you can see, it's got 3 distinct people in the ID column, and they've either had or not had an event, along with a date their event status was recorded (event_date).

The Problem

I'd like to create a new variable / column, event_within_interval, which assigns 1 to all the cells of a given ID if that ID has 2 or more event=1 within 180 days of their first event=1.

Let me explain further: both ID=a and ID=c have 2 or more events each, but only ID=c has their second event within 180 days of their first (so here, the 4/7/2013 - 3/14/2013 = 24 days for ID=c).

The problem is that I'm not sure how to tell R this idea of "if the second happens within 180 days of the first event=1".

What I'd like

Here's what I'm looking for:

want <- data.frame(ID = c("a","a","b","b","c","c","c"),
                event = c(1,1,1,0,0,1,1),
                event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2013-04-07","2014-07-14")),
                event_within_interval = c(0,0,0,0,1,1,1),
                stringsAsFactors=FALSE)

What I've tried

I've only got the beginnings of an attempt thus far:

d <- d %>% 
  mutate(event_within_interval = ID %in% if_else(d$event == 1, 1, 0))

But this doesn't give me what I'd like, as you can tell if you run the code.

I've set the thing up as an if_else, but I'm not sure where to go from here.

UPDATE: I've edited both reproducible examples (what I've got and what I want) to emphasize the fact that the desired date interval needs to be between the first event and the second event, not the first event and the last event. (A couple of users submitted examples using last, which worked for the previous iteration of the reproducible example but wouldn't have worked on the real dataset.)

CodePudding user response:

What about by packages lubridate and data.table?

library(data.table)
library(lubridate)

d <- data.frame(ID = c("a","a","b","b", "c","c"),
                event = c(1,1,0,0,1,1),
                event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2013-04-07")),
                stringsAsFactors=FALSE)

d <-  data.table(d)

d <- d[, event_within_interval := 0]

timeInterval <- interval(start = "2013-03-14", end = "2013-04-07")

d <- d[event == 1 & event_date %within% timeInterval, event_within_interval := 1]

d
#    ID event event_date event_within_interval
# 1:  a     1 2011-01-01                     0
# 2:  a     1 2012-08-21                     0
# 3:  b     0 2011-12-23                     0
# 4:  b     0 2011-12-31                     0
# 5:  c     1 2013-03-14                     1
# 6:  c     1 2013-04-07                     1

CodePudding user response:

This is good fun.

Scenario 1

My approach would be to

  • group events by ID
  • Apply first condition check on two the span of days between current date and initial date
  • check if the sum of events is bigger or equal two: sum(event) >= 2
  • only if the two conditions are met I would return one for the event

For readability, I've returned values of conditions in the data as test_* variables.


  d %>%
    group_by(ID) %>%
    mutate(test_interval = event_date - min(event_date) < 180,
           test_sum_events = sum(event) >= 2,
           event_within_interval = if_else(test_interval & test_sum_events, 
                                           1, 0)) %>%
    ungroup()

Scenario 2

In this scenario, the data is sorted by event_date within ID and the difference between the first event and second event has to be under 180 days. Rest of events is ignored.

d %>%
    group_by(ID) %>%
    arrange(event_date) %>%
    mutate(
        # Check the difference between first event: min(event_date) and
        # second event: event_date[2]
        test_interval_first_two = event_date[2] - min(event_date) <= 180,
        test_sum_events = sum(event) >= 2,
        event_within_interval = if_else(
          test_interval_first_two & test_sum_events, 1, 0)
    ) %>%
    ungroup()

CodePudding user response:

You can first group_by the ID column, so that we can calculate days within the same ID. Then in the condition in the if_else statement, use condition with sum() > 1 AND day difference <= 180.

Here I assume there's only two "events" or rows per ID.

library(dplyr)

d %>% 
  group_by(ID) %>% 
  mutate(event_within_interval = if_else(sum(event) > 1 & last(event_date) - first(event_date) <= 180, 1L, 0L))

# A tibble: 6 x 4
# Groups:   ID [3]
  ID    event event_date event_within_interval
  <chr> <dbl> <date>                     <int>
1 a         1 2011-01-01                     0
2 a         1 2012-08-21                     0
3 b         0 2011-12-23                     0
4 b         0 2011-12-31                     0
5 c         1 2013-03-14                     1
6 c         1 2013-04-07                     1

CodePudding user response:

Here is how we could do it. In this example with an additional column interval to see the interval and then use an ifelse statement.

library(dpylr)

d %>% 
  group_by(ID) %>% 
  mutate(interval = last(event_date)- first(event_date),
         event_within_interval = ifelse(event == 1 &
                                          interval < 180, 1, 0))
  ID    event event_date interval event_within_interval
  <chr> <dbl> <date>     <drtn>                   <dbl>
1 a         1 2011-01-01 598 days                     0
2 a         1 2012-08-21 598 days                     0
3 b         0 2011-12-23   8 days                     0
4 b         0 2011-12-31   8 days                     0
5 c         1 2013-03-14  24 days                     1
6 c         1 2013-04-07  24 days                     1
  • Related