Home > OS >  How do you calculate the mean of the outcome, grouped by individual, according to the time of event?
How do you calculate the mean of the outcome, grouped by individual, according to the time of event?

Time:06-20

I'm doing an event-study project, and I want to calculate the average outcome.

Suppose we have 3 individuals. The event occurs to individual 1 in 2019, to individual 2 in 2020, and to individual 3 in 2017.

The outcome of individual 1 in 2019 is 1, and the outcome of individual 2 in 2019 is 0. The event predates the survey year for individual 3, therefore we exclude individual 3 when calculating the average. The average probability should be 0.5 in this case.

I wonder how can you do this in R?

Thank you!

Here's the artificial data:

ID<-c(1,1,1,1,2,2,2,3,3)
year<-c(2018,2019,2020,2021,2019,2020,2021,2018,2020)
outcome<-c(1,1,0,0,1,0,0,0,1)
event_year<-c(2019,2019,2019,2019,2020,2020,2020,2017,2017)
df<-as.data.frame(cbind(ID,year,outcome,event_year))
df

> df
  ID year outcome event_year
1  1 2018       1       2019
2  1 2019       1       2019
3  1 2020       0       2019
4  1 2021       0       2019
5  2 2019       1       2020
6  2 2020       0       2020
7  2 2021       0       2020
8  3 2018       0       2017
9  3 2020       1       2017

CodePudding user response:

If I understand your question correctly, you should group_by year like this:

library(dplyr)
df %>%
  group_by(year) %>%
  summarise(mean = mean(outcome))

Output:

# A tibble: 4 × 2
   year  mean
  <dbl> <dbl>
1  2018 0.5  
2  2019 1    
3  2020 0.333
4  2021 0 

CodePudding user response:

In base R, we may do

aggregate(outcome ~ year, df, mean)
  •  Tags:  
  • r
  • Related