I have a dataset where I have two columns of ID
and Category
.
An ID
may have more than one unique category. My goal is to identify and count those IDs that have two categories: a & b
This is a snapshot of the dataset:
ID <- c(1,2,1,3,1,2,3,4,5,4,5)
category <- c("a", "a", "b", "a", "c", "a", "b", "b", "c", "d", "c")
ID category
1 1 a
2 2 a
3 1 b
4 3 a
5 1 c
6 2 a
7 3 b
8 4 b
9 5 c
10 4 d
11 5 c
In this example, IDs 1 & 3
have categories a & b
.
So, I need to identify such IDs and also count the total number of such IDs.
I truly appreciate your help in advance.
CodePudding user response:
We may do
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(all(c("a", "b") %in% category)) %>%
ungroup %>%
distinct(ID)
-output
# A tibble: 2 × 1
ID
<dbl>
1 1
2 3
CodePudding user response:
Not sure what the expected output should look like, but here you have the distinct
ID
s that have categories a and b:
data.frame(ID, category) %>%
group_by(ID) %>%
filter(any(category == "a") & any(category == "b")) %>%
distinct(ID)
# ID
#1 1
#2 3