I am trying to solve a problem. The question I want to ask ifelse() can be translated as "Did individuals from our survey state witness an Event within the last two years?"
I've made an example table for the sake of clarity.
#Creating example table
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))
I want to use mutate() to create a new variable called "two_years" and that fulfills the conditions of an ifelse() in which:
If individuals from [state_code] witnessed any [event] within [year], [year-1], [year-2], the dummy equals 1. Hence, if [event]'s sum > 0, the condition is fulfilled.
The observation unit is the individual. In my own database, there are multiple individuals within the years and states. But, in this case, each unique individual does not matter for the ifelse() condition. It should be applied to all individuals within the [state_code] that were surveyed in the [year].
CodePudding user response:
I think this is what you're looking for. It should generalise to the situation you describe in which you have more than one individual per year. Apologies if I've misunderstood the question.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))
tab %>%
group_by(state_code, year) %>%
summarise(event = sum(event),
.groups = "drop_last") %>%
mutate(dummy = ifelse(event == 1 |
lag(event, 1L, order_by = year, default = 0) == 1 |
lag(event, 2L, order_by = year, default = 0) == 1,
1,
0)
) %>%
ungroup() %>%
select(-event) %>%
right_join(tab,
by = c("state_code", "year"))
#> # A tibble: 10 × 5
#> state_code year dummy individual event
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 1200 1992 0 Person 1 0
#> 2 1200 1993 0 Person 2 0
#> 3 1200 1994 1 Person 3 1
#> 4 1200 1995 1 Person 4 0
#> 5 1200 1996 1 Person 5 0
#> 6 1201 1992 0 Person 6 0
#> 7 1201 1993 0 Person 7 0
#> 8 1201 1994 0 Person 8 0
#> 9 1201 1995 0 Person 9 0
#> 10 1201 1995 0 Person 10 0
Created on 2022-03-15 by the reprex package (v2.0.1)
CodePudding user response:
Here is one approach. The first two years are not really defined, but if you want those to be zero, just change shift(event,0:2)
to shift(event,0:2, fill=0)
inner_join(
tab,
tab %>%
group_by(state_code,year) %>%
summarize(event=sum(event,na.rm=T)) %>%
mutate(two_years = as.integer(Reduce(` `,data.table::shift(event,0:2))>0)) %>%
select(!event),
by=c("state_code","year")
)
state_code individual event year two_years
1 1200 Person 1 0 1992 NA
2 1200 Person 2 0 1993 NA
3 1200 Person 3 1 1994 1
4 1200 Person 4 0 1995 1
5 1200 Person 5 0 1996 1
6 1201 Person 6 0 1992 NA
7 1201 Person 7 0 1993 NA
8 1201 Person 8 0 1994 0
9 1201 Person 9 0 1995 0
10 1201 Person 10 0 1995 0