I have a data frame:
col1 col2 col3
a 1 n1
a 1 n2
a 2 n3
a 2 n4
b 2 n5
b 3 n6
c 4 n7
c 5 n8
c 6 n9
And I want to return all rows in which the value in col2 is shared by two or more categories in col1, i.e:
a 2 n3
a 2 n4
b 2 n5
This seems like such a simple problem, but I've been pulling my hair out trying to find a solution that works. Been playing about with combinations of filter, duplicate in dplyr etc. to no avail. Much of the trouble comes from there being multiple duplicates in col2 I don't want to filter out (as they're the same in col1).
data %>% group_by(col1) %>% filter(???)
Any help very much appreciated!
CodePudding user response:
Does this work? Group on col2
and look for cases with > 1 value for col1
.
library(dplyr)
data %>%
group_by(col2) %>%
filter(n_distinct(col1) > 1) %>%
ungroup()
Result:
# A tibble: 3 × 3
col1 col2 col3
<chr> <int> <chr>
1 a 2 n3
2 a 2 n4
3 b 2 n5
CodePudding user response:
Using base R
subset(data, col2 %in% names(which(lengths(with(unique(data[1:2]),
split(col1, col2))) > 1)))
col1 col2 col3
3 a 2 n3
4 a 2 n4
5 b 2 n5