How do you calculate the mean of the outcome, grouped by individual, according to the time of event?-CodePudding

I'm doing an event-study project, and I want to calculate the average outcome.

Suppose we have 3 individuals. The event occurs to individual 1 in 2019, to individual 2 in 2020, and to individual 3 in 2017.

The outcome of individual 1 in 2019 is 1, and the outcome of individual 2 in 2019 is 0. The event predates the survey year for individual 3, therefore we exclude individual 3 when calculating the average. The average probability should be 0.5 in this case.

I wonder how can you do this in R?

Thank you!

Here's the artificial data:

ID<-c(1,1,1,1,2,2,2,3,3)
year<-c(2018,2019,2020,2021,2019,2020,2021,2018,2020)
outcome<-c(1,1,0,0,1,0,0,0,1)
event_year<-c(2019,2019,2019,2019,2020,2020,2020,2017,2017)
df<-as.data.frame(cbind(ID,year,outcome,event_year))
df

> df
  ID year outcome event_year
1  1 2018       1       2019
2  1 2019       1       2019
3  1 2020       0       2019
4  1 2021       0       2019
5  2 2019       1       2020
6  2 2020       0       2020
7  2 2021       0       2020
8  3 2018       0       2017
9  3 2020       1       2017

CodePudding user response：

If I understand your question correctly, you should group_by year like this:

library(dplyr)
df %>%
  group_by(year) %>%
  summarise(mean = mean(outcome))

Output:

# A tibble: 4 × 2
   year  mean
  <dbl> <dbl>
1  2018 0.5  
2  2019 1    
3  2020 0.333
4  2021 0

CodePudding user response：

In base R, we may do

aggregate(outcome ~ year, df, mean)