I am trying to create a new variable (v2) based on a pattern of numerical responses to another variable (v1). The dataset I am working with is in long format and ordered by visit. I have tried grouping by the 'id' variable and using various combinations of 'summarise' in dplyr, but cannot seem to figure this out. Below is an example of what I would like to achieve.
id visit v1 v2
<dbl> <int> <dbl> <int>
1 10001 1 0 1
2 10001 2 0 1
3 10002 1 0 2
4 10002 2 1 2
5 10003 1 1 3
6 10003 2 0 3
The value of 1 for v2 should reflect a response pattern of 0 across two visits for id 10001, 2 reflects a response pattern of 0/1, and so on.
Thank you in advance for the help!
CodePudding user response:
Another way is:
dat %>%
group_by(id) %>%
mutate(v2 = c("00" = 1, "01" = 2, "10" = 3, "11" = 4)[paste(v1, collapse = "")])
# A tibble: 6 x 4
# Groups: id [3]
id visit v1 v2
<int> <int> <int> <dbl>
1 10001 1 0 1
2 10001 2 0 1
3 10002 1 0 2
4 10002 2 1 2
5 10003 1 1 3
6 10003 2 0 3
CodePudding user response:
Assumption:
- within an
id
, we always have exactly 2 rows
base R
ave(dat$v1, dat$id, FUN = function(z) {
if (length(z) != 2) return(NA_integer_)
switch(paste(z, collapse = ""),
"00" = 1L,
"01" = 2L,
"10" = 3L,
"11" = 4L,
NA_integer_)
})
# [1] 1 1 2 2 3 3
dplyr
library(dplyr)
dat %>%
group_by(id) %>%
mutate(v2 = if (n() != 2) NA_integer_ else case_when(
all(v1 == c(0L, 0L)) ~ 1L,
all(v1 == c(0L, 1L)) ~ 2L,
all(v1 == c(1L, 0L)) ~ 3L,
all(v1 == c(1L, 1L)) ~ 4L,
TRUE ~ NA_integer_)
) %>%
ungroup()
# # A tibble: 6 x 4
# id visit v1 v2
# <int> <int> <int> <int>
# 1 10001 1 0 1
# 2 10001 2 0 1
# 3 10002 1 0 2
# 4 10002 2 1 2
# 5 10003 1 1 3
# 6 10003 2 0 3
Data
dat <- structure(list(id = c(10001L, 10001L, 10002L, 10002L, 10003L, 10003L), visit = c(1L, 2L, 1L, 2L, 1L, 2L), v1 = c(0L, 0L, 0L, 1L, 1L, 0L), v2 = c(1L, 1L, 2L, 2L, 3L, 3L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))