In R, can I make a treatment identifier in all waves for unique ID's which meet a criteria iden-CodePudding

Slowly improving on my R skills/ I'm analyzing survey data with up to 10 waves. Certain benefits were cut off and only exist in waves 1-3 (out of 10). I want those who have received the benefit in all the first 3 waves to be my treatment, with an identifier for the individual in waves beyond 1-3.

I can make a dummy variable that signifies if an individual has received the benefit in waves 1-3 but I need a find a way to create a treatment.

in effect I want the unique ID with this specific dummy variable, to have a dummy identifier in all waves even waves after the benefit was cut.

simplified example:

wave <- c("wave1","wave2","wave3","wave4","wave1","wave2","wave3","wave4")
personal_ID <- c(101,101,101,101,201,201,201,201)
benefit_dummy_by_wave <- c(1,1,1,0,1,1,0,0)

df <- data.frame(personal_ID, benefit_dummy_by_wave, wave)


 personal_ID benefit_dummy_by_wave  wave
1         101                     1 wave1
2         101                     1 wave2
3         101                     1 wave3
4         101                     0 wave4
5         201                     1 wave1
6         201                     1 wave2
7         201                     0 wave3
8         201                     0 wave4

I want something like this

  personal_ID benefit_dummy_by_wave  wave benefit_1_through_3
1         101                     1 wave1                   1
2         101                     1 wave2                   1
3         101                     1 wave3                   1
4         101                     0 wave4                   1
5         201                     1 wave1                   0
6         201                     1 wave2                   0
7         201                     0 wave3                   0
8         201                     0 wave4                   0

Really appreciate the help if anyone has any ideas. Really glad that this community exists!

CodePudding user response：

Not sure if I understood your question right. You could use dplyr:

library(dplyr)

df %>% 
  group_by(personal_ID) %>% 
  arrange(personal_ID, wave) %>% 
  mutate(benefit_1_through_3 = sum(wave == "wave3" & cumsum(benefit_dummy_by_wave) == 3)) %>% 
  ungroup()

This returns

# A tibble: 8 x 4
  personal_ID benefit_dummy_by_wave wave  benefit_1_through_3
        <dbl>                 <dbl> <chr>               <int>
1         101                     1 wave1                   1
2         101                     1 wave2                   1
3         101                     1 wave3                   1
4         101                     0 wave4                   1
5         201                     1 wave1                   0
6         201                     1 wave2                   0
7         201                     0 wave3                   0
8         201                     0 wave4                   0

CodePudding user response：

library(tidyverse)

wave <- c(1,2,3,4,1,2,3,4)
personal_ID <- c(101,101,101,101,201,201,201,201)
benefit_dummy_by_wave <- c(1,1,1,0,1,1,0,0)
data <- tibble(wave, personal_ID, benefit_dummy_by_wave)
data
#> # A tibble: 8 x 3
#>    wave personal_ID benefit_dummy_by_wave
#>   <dbl>       <dbl>                 <dbl>
#> 1     1         101                     1
#> 2     2         101                     1
#> 3     3         101                     1
#> 4     4         101                     0
#> 5     1         201                     1
#> 6     2         201                     1
#> 7     3         201                     0
#> 8     4         201                     0

benefit_1_through_3_data <-
  data %>%
  group_by(personal_ID) %>%
  pivot_wider(names_from = wave, values_from = benefit_dummy_by_wave, names_prefix = "wave") %>%
  mutate(benefit_1_through_3 = as.integer(
    wave1 == 1 & wave2 == 1 & wave3 == 1
  )) %>%
  select(personal_ID, benefit_1_through_3)
benefit_1_through_3_data
#> # A tibble: 2 x 2
#> # Groups:   personal_ID [2]
#>   personal_ID benefit_1_through_3
#>         <dbl>               <int>
#> 1         101                   1
#> 2         201                   0

data %>% left_join(benefit_1_through_3_data)
#> Joining, by = "personal_ID"
#> # A tibble: 8 x 4
#>    wave personal_ID benefit_dummy_by_wave benefit_1_through_3
#>   <dbl>       <dbl>                 <dbl>               <int>
#> 1     1         101                     1                   1
#> 2     2         101                     1                   1
#> 3     3         101                     1                   1
#> 4     4         101                     0                   1
#> 5     1         201                     1                   0
#> 6     2         201                     1                   0
#> 7     3         201                     0                   0
#> 8     4         201                     0                   0

^{Created on 2021-10-18 by the reprex package (v2.0.1)}

CodePudding user response：

Here's another way -

library(dplyr)

check_waves <- paste0('wave', 1:3)

df %>%
  group_by(personal_ID) %>%
  mutate(benefit_1_through_3 = as.integer(all(check_waves %in% wave & 
                            benefit_dummy_by_wave[match(check_waves, wave)] == 1))) %>%
  ungroup()

#  personal_ID benefit_dummy_by_wave wave  benefit_1_through_3
#        <dbl>                 <dbl> <chr>               <int>
#1         101                     1 wave1                   1
#2         101                     1 wave2                   1
#3         101                     1 wave3                   1
#4         101                     0 wave4                   1
#5         201                     1 wave1                   0
#6         201                     1 wave2                   0
#7         201                     0 wave3                   0
#8         201                     0 wave4                   0