How do I command ifelse() to create a dummy variable that interacts with 3 other variables?-CodePudding

I am trying to solve a problem. The question I want to ask ifelse() can be translated as "Did individuals from our survey state witness an Event within the last two years?"

I've made an example table for the sake of clarity.

#Creating example table
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
                  individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
                  event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
                  year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))

I want to use mutate() to create a new variable called "two_years" and that fulfills the conditions of an ifelse() in which:

If individuals from [state_code] witnessed any [event] within [year], [year-1], [year-2], the dummy equals 1. Hence, if [event]'s sum > 0, the condition is fulfilled.
The observation unit is the individual. In my own database, there are multiple individuals within the years and states. But, in this case, each unique individual does not matter for the ifelse() condition. It should be applied to all individuals within the [state_code] that were surveyed in the [year].

CodePudding user response：

I think this is what you're looking for. It should generalise to the situation you describe in which you have more than one individual per year. Apologies if I've misunderstood the question.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
                  individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
                  event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
                  year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))

tab %>%
  group_by(state_code, year) %>%
  summarise(event = sum(event),
            .groups = "drop_last") %>%
  mutate(dummy = ifelse(event == 1 |
                          lag(event, 1L, order_by = year, default = 0) == 1 |
                          lag(event, 2L, order_by = year, default = 0) == 1,
                        1,
                        0)
  ) %>%
  ungroup() %>%
  select(-event) %>%
  right_join(tab, 
             by = c("state_code", "year"))
#> # A tibble: 10 × 5
#>    state_code  year dummy individual event
#>    <chr>      <dbl> <dbl> <chr>      <dbl>
#>  1 1200        1992     0 Person 1       0
#>  2 1200        1993     0 Person 2       0
#>  3 1200        1994     1 Person 3       1
#>  4 1200        1995     1 Person 4       0
#>  5 1200        1996     1 Person 5       0
#>  6 1201        1992     0 Person 6       0
#>  7 1201        1993     0 Person 7       0
#>  8 1201        1994     0 Person 8       0
#>  9 1201        1995     0 Person 9       0
#> 10 1201        1995     0 Person 10      0

^{Created on 2022-03-15 by the reprex package (v2.0.1)}

CodePudding user response：

Here is one approach. The first two years are not really defined, but if you want those to be zero, just change shift(event,0:2) to shift(event,0:2, fill=0)

inner_join(
  tab,
  tab %>%
    group_by(state_code,year) %>%
    summarize(event=sum(event,na.rm=T)) %>% 
    mutate(two_years = as.integer(Reduce(` `,data.table::shift(event,0:2))>0)) %>% 
    select(!event),
  by=c("state_code","year")
)

   state_code individual event year two_years
1        1200   Person 1     0 1992        NA
2        1200   Person 2     0 1993        NA
3        1200   Person 3     1 1994         1
4        1200   Person 4     0 1995         1
5        1200   Person 5     0 1996         1
6        1201   Person 6     0 1992        NA
7        1201   Person 7     0 1993        NA
8        1201   Person 8     0 1994         0
9        1201   Person 9     0 1995         0
10       1201  Person 10     0 1995         0