I have a data frame where a unique ID is given to each unique instance where there is a string in either title.1
or title.2
. Each ID is coded with one or more names. See below:
title.1 | title.2 | name | ID |
---|---|---|---|
A | A1 | fruit | 1 |
A | A1 | fruit | 1 |
B1 | fruit | 2 | |
B | fruit, vegetable | 3 | |
C | C1 | vegetable, poultry, grain | 4 |
C | C1 | vegetable, poultry | 4 |
C | C1 | vegetable, poultry | 4 |
D1 | poultry | 5 | |
D1 | vegetable | 5 |
I need to identify which IDs have the same name across rows and which do not. To do this, I'd like to group by ID and test to see if all name
values are the same across all rows with that ID. Then, I'd like to append a new column with a boolean value indicating which IDs meet this condition and which do not. The output should look like this:
title.1 | title.2 | name | ID | names.equal |
---|---|---|---|---|
A | A1 | fruit | 1 | TRUE |
A | A1 | fruit | 1 | TRUE |
B1 | fruit | 2 | TRUE | |
B | fruit, vegetable | 3 | TRUE | |
C | C1 | vegetable, poultry, grain | 4 | FALSE |
C | C1 | vegetable, poultry | 4 | FALSE |
C | C1 | vegetable, poultry | 4 | FALSE |
D1 | poultry | 5 | FALSE | |
D1 | vegetable | 5 | FALSE |
CodePudding user response:
We may use n_distinct
on name to get the unique count and create logical with the count after grouping by ID
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(names.equal = n_distinct(name) == 1) %>%
ungroup
-output
# A tibble: 9 × 5
title.1 title.2 name ID names.equal
<chr> <chr> <chr> <int> <lgl>
1 A A1 fruit 1 TRUE
2 A A1 fruit 1 TRUE
3 <NA> B1 fruit 2 TRUE
4 B <NA> fruit, vegetable 3 TRUE
5 C C1 vegetable, poultry, grain 4 FALSE
6 C C1 vegetable, poultry 4 FALSE
7 C C1 vegetable, poultry 4 FALSE
8 <NA> D1 poultry 5 FALSE
9 <NA> D1 vegetable 5 FALSE
data
df1 <- structure(list(title.1 = c("A", "A", NA, "B", "C", "C", "C",
NA, NA), title.2 = c("A1", "A1", "B1", NA, "C1", "C1", "C1",
"D1", "D1"), name = c("fruit", "fruit", "fruit", "fruit, vegetable",
"vegetable, poultry, grain", "vegetable, poultry", "vegetable, poultry",
"poultry", "vegetable"), ID = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 5L,
5L)), class = "data.frame", row.names = c(NA, -9L))