Home > database >  Find groups that have a set of rows specified by the values in two columns
Find groups that have a set of rows specified by the values in two columns

Time:11-16

Here is my sample data frame. The actual data frame has a lot more groups and 9 conditions in each group.

df <- data.frame(
          Group = c('A','A','B','B','B','C','C','C','D','D','D','D'),
          Condition = c('cond2', 'cond3','cond1','cond2','cond3','cond1','cond2','cond3', 'cond1','cond2','cond3','cond4'),
          Value = c(0,0,0,1,0,0,0,1,0,1,0,0)
        )
> df
   Group Condition Value
1      A     cond2     0
2      A     cond3     0
3      B     cond1     0
4      B     cond2     1
5      B     cond3     0
6      C     cond1     0
7      C     cond2     0
8      C     cond3     1
9      D     cond1     0
10     D     cond2     1
11     D     cond3     0
12     D     cond4     0

Question I: I'd like to pick group B, in which cond1 == 0, cond2 == 1, and cond3 == 0.

The desired output:

  Group Condition Value
1     B     cond1     0
2     B     cond2     1
3     B     cond3     0

Question II: how to select all the groups that meet cond1 == 0, cond2 == 1, that is, group A and group B in my example dataframe.

CodePudding user response:

In Base R:

v <- c('cond1==0', 'cond2==1','cond3==0')

subset(df, ave(paste(Condition, Value, sep = '==')%in% v, Group, FUN = all))

  Group Condition Value
3     B     cond1     0
4     B     cond2     1
5     B     cond3     0

CodePudding user response:

Question 1

Getting the answer for the first question is simple, given that filtering for Group B gives you the exact output you want already.

#### Load Library ####
library(dplyr)

#### Filter ####
df %>% 
  filter(Group == "B")

As seen here:

  Group Condition Value
1     B     cond1     0
2     B     cond2     1
3     B     cond3     0

Question 2

For the second question, your logic doesn't entirely make sense because Groups B, C, and D meet the conditions you are trying to find. A doesn't ever meet the assumption of either filter. As an example, if you unite the conditions and values, then filter for the conditions you mentioned:

#### Unite and Filter ####
df %>% 
  unite("Merged_Group",
        Condition,
        Value) %>% 
  filter(Merged_Group %in% c("cond1_0",
                             "cond2_1"))

You will get this output:

 Group Merged_Group
1     B      cond1_0
2     B      cond2_1
3     C      cond1_0
4     D      cond1_0
5     D      cond2_1

Edit

Going off what you said in the comments, this is a way to be more explicit about Question 1.

df %>% 
  unite("Merged_Group",
      Condition,
      Value) %>% 
  filter(Merged_Group %in% c("cond1_0",
                             "cond2_1",
                             "cond3_0"),
         Group == "B")

Giving you this:

 Group Merged_Group
1     B      cond1_0
2     B      cond2_1
3     B      cond3_0
  • Related