I'm doing an event-study project, and I want to calculate the average outcome.
Suppose we have 3 individuals. The event occurs to individual 1 in 2019, to individual 2 in 2020, and to individual 3 in 2017.
The outcome of individual 1 in 2019 is 1, and the outcome of individual 2 in 2019 is 0. The event predates the survey year for individual 3, therefore we exclude individual 3 when calculating the average. The average probability should be 0.5 in this case.
I wonder how can you do this in R?
Thank you!
Here's the artificial data:
ID<-c(1,1,1,1,2,2,2,3,3)
year<-c(2018,2019,2020,2021,2019,2020,2021,2018,2020)
outcome<-c(1,1,0,0,1,0,0,0,1)
event_year<-c(2019,2019,2019,2019,2020,2020,2020,2017,2017)
df<-as.data.frame(cbind(ID,year,outcome,event_year))
df
> df
ID year outcome event_year
1 1 2018 1 2019
2 1 2019 1 2019
3 1 2020 0 2019
4 1 2021 0 2019
5 2 2019 1 2020
6 2 2020 0 2020
7 2 2021 0 2020
8 3 2018 0 2017
9 3 2020 1 2017
CodePudding user response:
If I understand your question correctly, you should group_by
year like this:
library(dplyr)
df %>%
group_by(year) %>%
summarise(mean = mean(outcome))
Output:
# A tibble: 4 × 2
year mean
<dbl> <dbl>
1 2018 0.5
2 2019 1
3 2020 0.333
4 2021 0
CodePudding user response:
In base R
, we may do
aggregate(outcome ~ year, df, mean)