Home > Blockchain >  Finding rows based on two different values in another column in dplyr
Finding rows based on two different values in another column in dplyr

Time:11-03

I have a dataset where I have two columns of ID and Category. An ID may have more than one unique category. My goal is to identify and count those IDs that have two categories: a & b

This is a snapshot of the dataset:

ID <- c(1,2,1,3,1,2,3,4,5,4,5)
category <- c("a", "a", "b", "a", "c", "a", "b", "b", "c", "d", "c")

   ID category
1   1        a
2   2        a
3   1        b
4   3        a
5   1        c
6   2        a
7   3        b
8   4        b
9   5        c
10  4        d
11  5        c

In this example, IDs 1 & 3 have categories a & b. So, I need to identify such IDs and also count the total number of such IDs. I truly appreciate your help in advance.

CodePudding user response:

We may do

library(dplyr)
df1 %>%
   group_by(ID) %>% 
   filter(all(c("a", "b") %in% category)) %>%
   ungroup %>% 
   distinct(ID)

-output

# A tibble: 2 × 1
     ID
  <dbl>
1     1
2     3

CodePudding user response:

Not sure what the expected output should look like, but here you have the distinct IDs that have categories a and b:

data.frame(ID, category) %>% 
  group_by(ID) %>% 
  filter(any(category == "a") & any(category == "b")) %>% 
  distinct(ID)

#     ID
#1     1
#2     3
  • Related