Home > Software design >  How to find which groups have identical values in a specific column using dplyr?
How to find which groups have identical values in a specific column using dplyr?

Time:10-28

I have a table that contains letters and numbers:

xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
              number = c (1, 2, 3, 1, 2, 3, 4, 5, 6))

I would like to first group data by "letter" and then check if there are any two groups that have identical values in number column. These would be groups with letters "a" and "b" in the "letter" column.

The result would look like this

xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
              number = c (1, 2, 3, 1, 2, 3, 4, 5, 6),
              duplicated = c (rep (TRUE, 6), rep (FALSE, 3)) )

is there a way to do this elegantly in dplyr?

CodePudding user response:

You can use cur_group() to subset xx and test for containment:

library(dplyr)

xx %>% 
  group_by(letter) %>% 
  mutate(duplicated = number %in% xx$number[!xx$letter %in% cur_group()]) %>%
  ungroup()
  1. cur_group() represents the value of the current grouping variable (eg a, b, c).
  2. We use the value of cur_group() to subset the vector xx$number for those that are not part of the current group (!xx$letter %in% cur_group()).
  3. Lastly, we check if the current group number is in that subset (number %in% ...).

Output

  letter number duplicated
  <chr>   <dbl> <lgl>     
1 a           1 TRUE      
2 a           2 TRUE      
3 a           3 TRUE      
4 b           1 TRUE      
5 b           2 TRUE      
6 b           3 TRUE      
7 c           4 FALSE     
8 c           5 FALSE     
9 c           6 FALSE     

CodePudding user response:

You may try

xx %>%
  distinct() %>%
  group_by(number) %>%
  mutate(n = n()) %>%
  mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
  select(-n)

      letter number duplicated
      <chr>   <dbl> <lgl>     
    1 a           1 TRUE      
    2 a           2 TRUE      
    3 a           3 TRUE      
    4 b           1 TRUE      
    5 b           2 TRUE      
    6 b           3 TRUE      
    7 c           4 FALSE     
    8 c           5 FALSE     
    9 c           6 FALSE 

distinct is for if there are some duplicated inside of gruoup,

xx <- tibble (letter = c (rep ("a", 4), rep ("b", 3), rep ("c", 3)),
              number = c (1,1, 2, 3, 1, 2, 3, 4, 4, 6))
   letter number
   <chr>   <dbl>
 1 a           1
 2 a           1
 3 a           2
 4 a           3
 5 b           1
 6 b           2
 7 b           3
 8 c           4
 9 c           4
10 c           6
    xxx <- xx %>%
      distinct() %>%
      group_by(number) %>%
      mutate(n = n()) %>%
      mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
      select(-n)

    xx %>% 
      full_join(xxx, by = c("letter", "number"))

   letter number duplicated
   <chr>   <dbl> <lgl>     
 1 a           1 TRUE      
 2 a           1 TRUE      
 3 a           2 TRUE      
 4 a           3 TRUE      
 5 b           1 TRUE      
 6 b           2 TRUE      
 7 b           3 TRUE      
 8 c           4 FALSE     
 9 c           4 FALSE     
10 c           6 FALSE  

Thanks to @LMc

xxx <- xx %>%
  distinct() %>%
  add_count(number) %>%
  mutate(duplicated = n> 1) %>%
  select(-n)
xxx
xx %>% 
  full_join(xxx, by = c("letter", "number"))
  • Related