Background
I've got this R
dataframe, d
. It looks like this:
d <- data.frame(ID = c("a","a","a","a","a","a","a","b","b","b","b"),
treatment = c(0,1,0,0,0,1,0,1,0,0,0),
event = c(0,0,1,1,1,1,1,0,1,1,1),
service_date = as.Date(c("2011-01-01",
"2011-08-21",
"2011-12-23",
"2012-02-23",
"2013-09-14",
"2013-04-07",
"2014-10-14",
"2013-01-01",
"2013-12-12",
"2014-06-17",
"2015-09-29")),
stringsAsFactors=FALSE)
It's got two people in it (ID
a and b) and some information about whether they received a treatment
, whether they had an event
, and a service_date
for when either of those things happens.
The problem & what I'm looking for
My goal is to figure out how many event==1
's people have on average in their n-th year after their first treatment==1
. Here's the result I'd want, and how I would do it by hand for the first year after treatment:
For each
ID
, find the firstservice_date
wheretreatment
equals1
. ForID
=a, that's2011-08-21
.For that "date of first
treatment
", count forwards 365 days. ForID
=a, that'd be2012-08-21
. This gives you an interval for "first year after firsttreatment
".Within that interval, count/tally how many times
event==1
. ForID
=a's first year (so between2011-08-21
and2012-08-21
), that's 2 times: once on2011-12-23
and another on2012-02-23
.Repeat steps 1, 2, and 3 for the other
ID
's (in this example it's only b) and get their count. For ForID
=b', this would only be one event: between2013-01-01
and one year later on2014-01-01
, they only have one event, on2013-12-12
.Sum the counts and divide by number of
ID
's to get an average. Here, that'd be (2 events 1 event) / 2 people == 1.5 events, on average, in Year 1 after first treatment
Ideally I'd like to be able to modify the code to define a different interval after first treatment
. Like year 2 could be "the time between first treatment
365 and first treatment
730".
What I've tried
I'm messing with some R code to try and do this. Conceptually, my approach consists of the following:
First, to
mutate
a new columnyear_interval
using thedifftime
function to define the interval in whichR
should be counting events for eachID
.Next, to
mutate
another columninterval_event_count
that does the actual counting.Finish the operation using
mean
.
This is probably not the only valid approach, of course (it may not even be valid at all