Filter rows where values in one column are shared by different categories in another, using dplyr-CodePudding

I have a data frame:

col1    col2    col3

a        1      n1
a        1      n2
a        2      n3  
a        2      n4
b        2      n5
b        3      n6
c        4      n7
c        5      n8
c        6      n9

And I want to return all rows in which the value in col2 is shared by two or more categories in col1, i.e:

a   2   n3
a   2   n4
b   2   n5

This seems like such a simple problem, but I've been pulling my hair out trying to find a solution that works. Been playing about with combinations of filter, duplicate in dplyr etc. to no avail. Much of the trouble comes from there being multiple duplicates in col2 I don't want to filter out (as they're the same in col1).

data %>% group_by(col1) %>% filter(???)

Any help very much appreciated!

CodePudding user response：

Does this work? Group on col2 and look for cases with > 1 value for col1.

library(dplyr)

data %>% 
  group_by(col2) %>% 
  filter(n_distinct(col1) > 1) %>%
  ungroup()

Result:

# A tibble: 3 × 3
  col1   col2 col3 
  <chr> <int> <chr>
1 a         2 n3   
2 a         2 n4   
3 b         2 n5

CodePudding user response：

Using base R

subset(data, col2 %in% names(which(lengths(with(unique(data[1:2]), 
    split(col1, col2))) > 1)))
  col1 col2 col3
3    a    2   n3
4    a    2   n4
5    b    2   n5