I have the following data:
df<-structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2), day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), x1 = c(15, 15, 15.2, 15.2,
15.3, 15.2, 15.3, 15, 15, 15.2, 15.3, 12, 12.1, 12.3, 12.2, 12,
12.4, 12.5, 12.4, 12.6, 12.7), x2 = c(1, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -21L))
And I want to generate a variable that indicates a change from 1 to 0 in x2, but only if the following 4 rows remain 0 (by ID). As in the first occurrence of a change in x2 from 1 to 0 for at least 4 days. To generate the variable in this data:
df2<-structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2), day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), x1 = c(15, 15, 15.2, 15.2,
15.3, 15.2, 15.3, 15, 15, 15.2, 15.3, 12, 12.1, 12.3, 12.2, 12,
12.4, 12.5, 12.4, 12.6, 12.7), x2 = c(1, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1), x3 = c(0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -21L))
Where x3 gets a value of 1 from the first occurrence of when x2 stops for at least 4 days, regardless of re-occurrence
I imagine there is a way to use lag or lead functions in dplyr, but I am not sure how to program the 'at least 4 days' condition. Any suggestions?
CodePudding user response:
We can use zoo::rollapply
for a rolling-window calculation.
fun <- function(z) (length(z) == 6 && z[1] == 1 && z[2] == 0 && all(z[-(1:2)] == 0))
df %>%
group_by(ID) %>%
mutate(x3a = cummax(zoo::rollapply(lead(x2), 6, fun, fill = 0))) %>%
ungroup()
# # A tibble: 21 x 6
# ID day x1 x2 x3 x3a
# <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 1 15 1 0 0
# 2 1 2 15 1 0 0
# 3 1 3 15.2 0 0 0
# 4 1 4 15.2 0 0 0
# 5 1 5 15.3 0 0 0
# 6 1 6 15.2 1 0 0
# 7 1 7 15.3 0 1 1
# 8 1 8 15 0 1 1
# 9 1 9 15 0 1 1
# 10 1 10 15.2 0 1 1
# # ... with 11 more rows
CodePudding user response:
A tidyverse solution could (also) look as follows:
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
mutate(grp = cumsum(x2)) %>%
group_by(ID, grp) %>%
mutate(fourOrMore = n() > 4,
x3 = lag(fourOrMore),
x3 = replace_na(x3, 0)) %>%
ungroup() %>%
select(- c("grp", "fourOrMore"))
# # A tibble: 21 × 5
# ID day x1 x2 x3
# <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 1 15 1 0
# 2 1 2 15 1 0
# 3 1 3 15.2 0 0
# 4 1 4 15.2 0 0
# 5 1 5 15.3 0 0
# 6 1 6 15.2 1 0
# 7 1 7 15.3 0 1
# 8 1 8 15 0 1
# 9 1 9 15 0 1
# 10 1 10 15.2 0 1
# # … with 11 more rows