Home > Enterprise >  R return ID based on multiple conditions
R return ID based on multiple conditions

Time:07-30

I have the following dataset:

ID = c('A','A','B','B','B','C','C','D','D','D')
B = c(1,1,1,1,2,1,2,1,2,1)
Condition1 = c(1,0,1,0,1,1,0,0,1,1)
Condition2 = c(0,1,0,1,0,0,1,1,0,0)
data2 <- data.frame(ID,B,Condition1,Condition2)

   ID B Condition1 Condition2
1   A 1          1          0
2   A 1          0          1
3   B 1          1          0
4   B 1          0          1
5   B 2          1          0
6   C 1          1          0
7   C 2          0          1
8   D 1          0          1
9   D 2          1          0
10  D 1          1          0

I want to get the ID that meets the conditions based on B, Condition1, and Condition 2 which

B[Condition1 ==1] != B[Condition2 ==1]

Desired output is a subset that satisfied the criteria above. In this case only C satisfied the criteria:

   ID B Condition1 Condition2
    C 1          1          0
    C 2          0          1

I tried :

data2 %>% group_by(ID) %>%
  filter((B[Condition1 ==1]) != (B[Condition2 ==1]))

But this only works when there is no additional row for each ID. For example: (no ID satisfied the criteria)

   ID B Condition1 Condition2
1   A 1          1          0
2   A 1          0          1
3   B 1          1          0
4   B 1          0          1

but if there is an additional row for ID 'B',

   ID B Condition1 Condition2
1   A 1          1          0
2   A 1          0          1
3   B 1          1          0
4   B 1          0          1
5   B 2          1          0

It would prompt an error

Error in `filter()`:
! Problem while computing `..1 = (B[Condition1 == 1]) != (B[Condition2 == 1])`.
x Input `..1` must be of size 3 or 1, not size 2.
ℹ The error occurred in group 2: ID = "B".

How do I write the condition statement to fix this problem? Thanks!

CodePudding user response:

We may need to wrap with all, and also instead of != use %in% with ! as there can be length difference

library(dplyr)
data2 %>% 
   group_by(ID) %>% 
   filter(all(!B[Condition1 == 1] %in% B[Condition2 == 1])) %>%
   ungroup

-output

# A tibble: 2 × 4
  ID        B Condition1 Condition2
  <chr> <dbl>      <dbl>      <dbl>
1 C         1          1          0
2 C         2          0          1
  • Related