Check if all rows are equal by group ID and return boolean value-CodePudding

I have a data frame where a unique ID is given to each unique instance where there is a string in either title.1 or title.2. Each ID is coded with one or more names. See below:

title.1	title.2	name	ID
A	A1	fruit	1
A	A1	fruit	1
	B1	fruit	2
B		fruit, vegetable	3
C	C1	vegetable, poultry, grain	4
C	C1	vegetable, poultry	4
C	C1	vegetable, poultry	4
	D1	poultry	5
	D1	vegetable	5

I need to identify which IDs have the same name across rows and which do not. To do this, I'd like to group by ID and test to see if all name values are the same across all rows with that ID. Then, I'd like to append a new column with a boolean value indicating which IDs meet this condition and which do not. The output should look like this:

title.1	title.2	name	ID	names.equal
A	A1	fruit	1	TRUE
A	A1	fruit	1	TRUE
	B1	fruit	2	TRUE
B		fruit, vegetable	3	TRUE
C	C1	vegetable, poultry, grain	4	FALSE
C	C1	vegetable, poultry	4	FALSE
C	C1	vegetable, poultry	4	FALSE
	D1	poultry	5	FALSE
	D1	vegetable	5	FALSE

CodePudding user response：

We may use n_distinct on name to get the unique count and create logical with the count after grouping by ID

library(dplyr)
df1 %>%
   group_by(ID) %>%
   mutate(names.equal = n_distinct(name) == 1) %>%
   ungroup

-output

# A tibble: 9 × 5
  title.1 title.2 name                         ID names.equal
  <chr>   <chr>   <chr>                     <int> <lgl>      
1 A       A1      fruit                         1 TRUE       
2 A       A1      fruit                         1 TRUE       
3 <NA>    B1      fruit                         2 TRUE       
4 B       <NA>    fruit, vegetable              3 TRUE       
5 C       C1      vegetable, poultry, grain     4 FALSE      
6 C       C1      vegetable, poultry            4 FALSE      
7 C       C1      vegetable, poultry            4 FALSE      
8 <NA>    D1      poultry                       5 FALSE      
9 <NA>    D1      vegetable                     5 FALSE

data

df1 <- structure(list(title.1 = c("A", "A", NA, "B", "C", "C", "C", 
NA, NA), title.2 = c("A1", "A1", "B1", NA, "C1", "C1", "C1", 
"D1", "D1"), name = c("fruit", "fruit", "fruit", "fruit, vegetable", 
"vegetable, poultry, grain", "vegetable, poultry", "vegetable, poultry", 
"poultry", "vegetable"), ID = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 5L, 
5L)), class = "data.frame", row.names = c(NA, -9L))