Home > Net >  Create new variable based on numeric pattern in R
Create new variable based on numeric pattern in R

Time:11-02

I am trying to create a new variable (v2) based on a pattern of numerical responses to another variable (v1). The dataset I am working with is in long format and ordered by visit. I have tried grouping by the 'id' variable and using various combinations of 'summarise' in dplyr, but cannot seem to figure this out. Below is an example of what I would like to achieve.

    id     visit    v1     v2
   <dbl>   <int>  <dbl>  <int>
 1 10001     1      0      1
 2 10001     2      0      1
 3 10002     1      0      2
 4 10002     2      1      2
 5 10003     1      1      3
 6 10003     2      0      3

The value of 1 for v2 should reflect a response pattern of 0 across two visits for id 10001, 2 reflects a response pattern of 0/1, and so on.

Thank you in advance for the help!

CodePudding user response:

Another way is:

dat %>%
    group_by(id) %>%
    mutate(v2 = c("00" = 1, "01" = 2, "10" = 3, "11" = 4)[paste(v1, collapse = "")])
# A tibble: 6 x 4
# Groups:   id [3]
     id visit    v1    v2
  <int> <int> <int> <dbl>
1 10001     1     0     1
2 10001     2     0     1
3 10002     1     0     2
4 10002     2     1     2
5 10003     1     1     3
6 10003     2     0     3

CodePudding user response:

Assumption:

  • within an id, we always have exactly 2 rows

base R

ave(dat$v1, dat$id, FUN = function(z) {
  if (length(z) != 2) return(NA_integer_)
  switch(paste(z, collapse = ""),
    "00" = 1L, 
    "01" = 2L, 
    "10" = 3L, 
    "11" = 4L, 
    NA_integer_)
})
# [1] 1 1 2 2 3 3

dplyr

library(dplyr)
dat %>%
  group_by(id) %>%
  mutate(v2 = if (n() != 2) NA_integer_ else case_when(
    all(v1 == c(0L, 0L)) ~ 1L, 
    all(v1 == c(0L, 1L)) ~ 2L, 
    all(v1 == c(1L, 0L)) ~ 3L, 
    all(v1 == c(1L, 1L)) ~ 4L, 
    TRUE ~ NA_integer_)
  ) %>%
  ungroup()
# # A tibble: 6 x 4
#      id visit    v1    v2
#   <int> <int> <int> <int>
# 1 10001     1     0     1
# 2 10001     2     0     1
# 3 10002     1     0     2
# 4 10002     2     1     2
# 5 10003     1     1     3
# 6 10003     2     0     3

Data

dat <- structure(list(id = c(10001L, 10001L, 10002L, 10002L, 10003L, 10003L), visit = c(1L, 2L, 1L, 2L, 1L, 2L), v1 = c(0L, 0L, 0L, 1L, 1L, 0L), v2 = c(1L, 1L, 2L, 2L, 3L, 3L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))
  • Related