Sample data:
structure(list(name_1 = c("Kevin", "Tom", "Laura", "Julie"),
name_2 = c("Tom", "Kevin", "Julie", "Laura"), value = c(10,
10, 20, 20)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
# A tibble: 4 × 3
name_1 name_2 value
<chr> <chr> <dbl>
1 Kevin Tom 10
2 Tom Kevin 10
3 Laura Julie 20
4 Julie Laura 20
How can I filter unique combination of the columns name_1
and name_2
without subsetting every other row? A tidyverse/dplyr method is prefered.
The expected output is as follows:
# A tibble: 2 × 3
name_1 name_2 value
<chr> <chr> <dbl>
1 Kevin Tom 10
2 Laura Julie 20
CodePudding user response:
We can sort the names (by row, using pmin
/pmax
) and then distinct
based on the sorted values. In this case, I removed the sorted columns but you can choose better names and drop the originals if you prefer.
library(dplyr)
quux %>%
mutate(a=pmin(name_1, name_2), b=pmax(name_1, name_2)) %>%
distinct(a, b, .keep_all = TRUE) %>%
select(-a, -b)
# # A tibble: 2 x 3
# name_1 name_2 value
# <chr> <chr> <dbl>
# 1 Kevin Tom 10
# 2 Laura Julie 20
Data
quux <- structure(list(name_1 = c("Kevin", "Tom", "Laura", "Julie"), name_2 = c("Tom", "Kevin", "Julie", "Laura"), value = c(10, 10, 20, 20)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
CodePudding user response:
Are you looking for something like this?
df[which(duplicated(apply(df[-3], 1, function(i) toString(sort(i))))),]
# A tibble: 2 × 3
name_1 name_2 value
<chr> <chr> <dbl>
1 Tom Kevin 10
2 Julie Laura 20