Home > Enterprise >  In R, what is the average frequency (count) of events per ID in Year N?
In R, what is the average frequency (count) of events per ID in Year N?

Time:06-05

Background

I've got this R dataframe, d. It looks like this:

d <- data.frame(ID = c("a","a","a","a","a","a","a","b","b","b","b"),
                treatment = c(0,1,0,0,0,1,0,1,0,0,0),
                event = c(0,0,1,1,1,1,1,0,1,1,1),
                service_date = as.Date(c("2011-01-01",   
                                         "2011-08-21",   
                                         "2011-12-23",   
                                         "2012-02-23",   
                                         "2013-09-14",   
                                         "2013-04-07",   
                                         "2014-10-14",   
                                         "2013-01-01",
                                         "2013-12-12",   
                                         "2014-06-17",
                                         "2015-09-29")), 
                stringsAsFactors=FALSE)

It's got two people in it (ID a and b) and some information about whether they received a treatment, whether they had an event, and a service_date for when either of those things happens.

The problem & what I'm looking for

My goal is to figure out how many event==1's people have on average in their n-th year after their first treatment==1. Here's the result I'd want, and how I would do it by hand for the first year after treatment:

  1. For each ID, find the first service_date where treatment equals 1. For ID=a, that's 2011-08-21.

  2. For that "date of first treatment", count forwards 365 days. For ID=a, that'd be 2012-08-21. This gives you an interval for "first year after first treatment".

  3. Within that interval, count/tally how many times event==1. For ID=a's first year (so between 2011-08-21 and 2012-08-21), that's 2 times: once on 2011-12-23 and another on 2012-02-23.

  4. Repeat steps 1, 2, and 3 for the other ID's (in this example it's only b) and get their count. For For ID=b', this would only be one event: between 2013-01-01 and one year later on 2014-01-01, they only have one event, on 2013-12-12.

  5. Sum the counts and divide by number of ID's to get an average. Here, that'd be (2 events 1 event) / 2 people == 1.5 events, on average, in Year 1 after first treatment

Ideally I'd like to be able to modify the code to define a different interval after first treatment. Like year 2 could be "the time between first treatment 365 and first treatment 730".

What I've tried

I'm messing with some R code to try and do this. Conceptually, my approach consists of the following:

  1. First, to mutate a new column year_interval using the difftime function to define the interval in which R should be counting events for each ID.

  2. Next, to mutate another column interval_event_count that does the actual counting.

  3. Finish the operation using mean.

This is probably not the only valid approach, of course (it may not even be valid at all

  • Related