I have a panel survey data where each row represents an individual, their interview date, and labor market status during that period. However, it's an unbalanced panel data where some observations appear more than others (i.e. because some individuals stopped responding to the survey's organizers). Data was collected on individuals before and after some observations were randomly given a cash assistance benefit.
I am interested in knowing whether some individuals stopped responding to our survey specifically after they received the cash benefit (i.e. the treatment date which is on 2019-09-03)? In other words, I am interested in testing the probability of leaving the survey relative to the "date" variable but I am not sure how to do that.
Here is a data example. For instance, we can see that some individuals like Cartman who received treatment in Sept 2019 stopped responding to the survey in following years and thus their job market status is recorded as "N/A" Other observations in the control group like Mackey who did not receive the treatment continued responding to the survey in the following years.
individual date labor_status cash_ benefit
Kenny 2018-09-02. unemployed 0
Kenny 2019-09-03. unemployed 1
Kenny 2020-09-07. employed 1
Kenny 2021-09-13. employed 1
Cartman 2018-09-03. unemployed 0
Cartman 2019-09-06. unemployed 1
Cartman 2020-09-08. N/A 1
Cartman 2021-09-08. N/A 1
Mackey 2018-09-03. employed 0
Mackey 2019-09-04. unemployed 0
Mackey 2020-09-08. employed 0
Mackey 2021-09-13. employed 0
CodePudding user response:
If you’re looking to test this statistically, you should ask on Cross Validated. But if you just want the probability of dropout after 2019 conditional on receiving benefit:
library(dplyr)
library(lubridate)
dat %>%
group_by(individual) %>%
summarize(
benefit = any(cash_benefit == 1),
dropout_after_2019 = all(
year(date) < 2019 |
(year(date) == 2019 & !is.na(labor_status)) |
is.na(labor_status)
)
) %>%
group_by(benefit) %>%
summarize(p_dropout_after_2019 = mean(dropout_after_2019))
# A tibble: 2 × 2
benefit p_dropout_after_2019
<lgl> <dbl>
1 FALSE 0
2 TRUE 0.5