Find 2 out of 3 conditions per ID-CodePudding

I have the following dataframe:

df <-read.table(header=TRUE, text="id code
                1 A
                1 B
                1 C
                2 A
                2 A
                2 A
                3 A
                3 B
                3 A")

Per id, I would love to find those individuals that have at least 2 conditions, namely:

conditionA = "A"
conditionB = "B"
conditionC = "C"

and create a new colum with "index", 1 if there are two or more conditions met and 0 otherwise:

df_output <-read.table(header=TRUE, text="id code index
                1 A 1
                1 B 1
                1 C 1
                2 A 0
                2 A 0
                2 A 0
                3 A 1
                3 B 1
                3 A 1")

So far I have tried the following:

df_output = df %>% 
     group_by(id) %>%
     mutate(index = ifelse(grepl(conditionA|conditionB|conditionC, code), 1, 0))

and as you can see I am struggling to get the threshold count into the code. Any help would be more than welcome.

Thanks!

CodePudding user response：

You can create a vector of conditions, and then use %in% and sum to count the number of occurrences in each group. Use (or ifelse) to convert logical into 1 and 0:

conditions = c("A", "B", "C")

df %>% 
  group_by(id) %>% 
  mutate(index =  (sum(unique(code) %in% conditions) >= 2))

     id code  index
1     1 A         1
2     1 B         1
3     1 C         1
4     2 A         0
5     2 A         0
6     2 A         0
7     3 A         1
8     3 B         1
9     3 A         1

CodePudding user response：

You could use n_distinct(), which is a faster and more concise equivalent of length(unique(x)).

df %>%
  group_by(id) %>%
  mutate(index =  (n_distinct(code) >= 2)) %>%
  ungroup()

# # A tibble: 9 × 3
#      id code  index
#   <int> <chr> <int>
# 1     1 A         1
# 2     1 B         1
# 3     1 C         1
# 4     2 A         0
# 5     2 A         0
# 6     2 A         0
# 7     3 A         1
# 8     3 B         1
# 9     3 A         1

CodePudding user response：

You can check conditions using intersect() function and check whether resulting list is of minimal (eg- 2) length.

conditions = c('A', 'B', 'C')

df_output2 = 
  df %>% 
  group_by(id) %>%
  mutate(index = as.integer(length(intersect(code, conditions)) >= 2))