Home > Blockchain >  Check if all rows are equal by group ID and return boolean value
Check if all rows are equal by group ID and return boolean value

Time:01-13

I have a data frame where a unique ID is given to each unique instance where there is a string in either title.1 or title.2. Each ID is coded with one or more names. See below:

title.1 title.2 name ID
A A1 fruit 1
A A1 fruit 1
B1 fruit 2
B fruit, vegetable 3
C C1 vegetable, poultry, grain 4
C C1 vegetable, poultry 4
C C1 vegetable, poultry 4
D1 poultry 5
D1 vegetable 5

I need to identify which IDs have the same name across rows and which do not. To do this, I'd like to group by ID and test to see if all name values are the same across all rows with that ID. Then, I'd like to append a new column with a boolean value indicating which IDs meet this condition and which do not. The output should look like this:

title.1 title.2 name ID names.equal
A A1 fruit 1 TRUE
A A1 fruit 1 TRUE
B1 fruit 2 TRUE
B fruit, vegetable 3 TRUE
C C1 vegetable, poultry, grain 4 FALSE
C C1 vegetable, poultry 4 FALSE
C C1 vegetable, poultry 4 FALSE
D1 poultry 5 FALSE
D1 vegetable 5 FALSE

CodePudding user response:

We may use n_distinct on name to get the unique count and create logical with the count after grouping by ID

library(dplyr)
df1 %>%
   group_by(ID) %>%
   mutate(names.equal = n_distinct(name) == 1) %>%
   ungroup

-output

# A tibble: 9 × 5
  title.1 title.2 name                         ID names.equal
  <chr>   <chr>   <chr>                     <int> <lgl>      
1 A       A1      fruit                         1 TRUE       
2 A       A1      fruit                         1 TRUE       
3 <NA>    B1      fruit                         2 TRUE       
4 B       <NA>    fruit, vegetable              3 TRUE       
5 C       C1      vegetable, poultry, grain     4 FALSE      
6 C       C1      vegetable, poultry            4 FALSE      
7 C       C1      vegetable, poultry            4 FALSE      
8 <NA>    D1      poultry                       5 FALSE      
9 <NA>    D1      vegetable                     5 FALSE     

data

df1 <- structure(list(title.1 = c("A", "A", NA, "B", "C", "C", "C", 
NA, NA), title.2 = c("A1", "A1", "B1", NA, "C1", "C1", "C1", 
"D1", "D1"), name = c("fruit", "fruit", "fruit", "fruit, vegetable", 
"vegetable, poultry, grain", "vegetable, poultry", "vegetable, poultry", 
"poultry", "vegetable"), ID = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 5L, 
5L)), class = "data.frame", row.names = c(NA, -9L))
  • Related