Slowly improving on my R skills/ I'm analyzing survey data with up to 10 waves. Certain benefits were cut off and only exist in waves 1-3 (out of 10). I want those who have received the benefit in all the first 3 waves to be my treatment, with an identifier for the individual in waves beyond 1-3.
I can make a dummy variable that signifies if an individual has received the benefit in waves 1-3 but I need a find a way to create a treatment.
in effect I want the unique ID with this specific dummy variable, to have a dummy identifier in all waves even waves after the benefit was cut.
simplified example:
wave <- c("wave1","wave2","wave3","wave4","wave1","wave2","wave3","wave4")
personal_ID <- c(101,101,101,101,201,201,201,201)
benefit_dummy_by_wave <- c(1,1,1,0,1,1,0,0)
df <- data.frame(personal_ID, benefit_dummy_by_wave, wave)
personal_ID benefit_dummy_by_wave wave
1 101 1 wave1
2 101 1 wave2
3 101 1 wave3
4 101 0 wave4
5 201 1 wave1
6 201 1 wave2
7 201 0 wave3
8 201 0 wave4
I want something like this
personal_ID benefit_dummy_by_wave wave benefit_1_through_3
1 101 1 wave1 1
2 101 1 wave2 1
3 101 1 wave3 1
4 101 0 wave4 1
5 201 1 wave1 0
6 201 1 wave2 0
7 201 0 wave3 0
8 201 0 wave4 0
Really appreciate the help if anyone has any ideas. Really glad that this community exists!
CodePudding user response:
Not sure if I understood your question right. You could use dplyr
:
library(dplyr)
df %>%
group_by(personal_ID) %>%
arrange(personal_ID, wave) %>%
mutate(benefit_1_through_3 = sum(wave == "wave3" & cumsum(benefit_dummy_by_wave) == 3)) %>%
ungroup()
This returns
# A tibble: 8 x 4
personal_ID benefit_dummy_by_wave wave benefit_1_through_3
<dbl> <dbl> <chr> <int>
1 101 1 wave1 1
2 101 1 wave2 1
3 101 1 wave3 1
4 101 0 wave4 1
5 201 1 wave1 0
6 201 1 wave2 0
7 201 0 wave3 0
8 201 0 wave4 0
CodePudding user response:
library(tidyverse)
wave <- c(1,2,3,4,1,2,3,4)
personal_ID <- c(101,101,101,101,201,201,201,201)
benefit_dummy_by_wave <- c(1,1,1,0,1,1,0,0)
data <- tibble(wave, personal_ID, benefit_dummy_by_wave)
data
#> # A tibble: 8 x 3
#> wave personal_ID benefit_dummy_by_wave
#> <dbl> <dbl> <dbl>
#> 1 1 101 1
#> 2 2 101 1
#> 3 3 101 1
#> 4 4 101 0
#> 5 1 201 1
#> 6 2 201 1
#> 7 3 201 0
#> 8 4 201 0
benefit_1_through_3_data <-
data %>%
group_by(personal_ID) %>%
pivot_wider(names_from = wave, values_from = benefit_dummy_by_wave, names_prefix = "wave") %>%
mutate(benefit_1_through_3 = as.integer(
wave1 == 1 & wave2 == 1 & wave3 == 1
)) %>%
select(personal_ID, benefit_1_through_3)
benefit_1_through_3_data
#> # A tibble: 2 x 2
#> # Groups: personal_ID [2]
#> personal_ID benefit_1_through_3
#> <dbl> <int>
#> 1 101 1
#> 2 201 0
data %>% left_join(benefit_1_through_3_data)
#> Joining, by = "personal_ID"
#> # A tibble: 8 x 4
#> wave personal_ID benefit_dummy_by_wave benefit_1_through_3
#> <dbl> <dbl> <dbl> <int>
#> 1 1 101 1 1
#> 2 2 101 1 1
#> 3 3 101 1 1
#> 4 4 101 0 1
#> 5 1 201 1 0
#> 6 2 201 1 0
#> 7 3 201 0 0
#> 8 4 201 0 0
Created on 2021-10-18 by the reprex package (v2.0.1)
CodePudding user response:
Here's another way -
library(dplyr)
check_waves <- paste0('wave', 1:3)
df %>%
group_by(personal_ID) %>%
mutate(benefit_1_through_3 = as.integer(all(check_waves %in% wave &
benefit_dummy_by_wave[match(check_waves, wave)] == 1))) %>%
ungroup()
# personal_ID benefit_dummy_by_wave wave benefit_1_through_3
# <dbl> <dbl> <chr> <int>
#1 101 1 wave1 1
#2 101 1 wave2 1
#3 101 1 wave3 1
#4 101 0 wave4 1
#5 201 1 wave1 0
#6 201 1 wave2 0
#7 201 0 wave3 0
#8 201 0 wave4 0