Home > Software design >  How can I remove duplicate rows from a data frame when the data is the same, but in a different orde
How can I remove duplicate rows from a data frame when the data is the same, but in a different orde


I have the following data frame:

df <- data.frame(node1=c("a","b", "c","d"), node2=c("b","a","f","g"),value=c(2,2,5,7))

I want to remove rows where node1 and node2 contain the same letters (Regardless of order) and same value so the resulting df should look like:

df <- data.frame(node1=c("a","c","d"), node2=c("b","f","g"), value=c(2,5,7))

Please help, thank you.

CodePudding user response:

Here is another option using dplyr and textshape packages.


  1. First, create two new variables that paste "node1" and "value," and "node2" and "value".

  2. Then, use unique_pairs() to find the distinct combinations between the new columns you created, regardless of the order.

  3. Delete the unnecessary columns.


df %>% 
  mutate(nv1 = paste0(node1, value),
         nv2 = paste0(node2, value)) %>% 
  unique_pairs("nv1", "nv2") %>% 
  select(-nv1, -nv2)


   node1 node2 value
 1     a     b     2
 3     c     f     5
 4     d     g     7

ps: "nv1" and "nv2" stand for "new value based on node1" and "new value based on node2." It could be whatever.

CodePudding user response:

This uses an environment to keep track of already seen node1, node2 and value combinations:


df |> filter({
    M <- new.env()

    key <- purrr::map2(node1, node2, c) |>
        purrr::map(sort) |>
        purrr::map(~ paste(.x,collapse = "\t"))

    purrr::map2_lgl(key, value, ~{
        if (is.null(M[[.x]])) {
            M[[.x]] <<- list()
        res <- ! .y %in% M[[.x]]
        M[[.x]] <<- union(M[[.x]], .y)

# A tibble: 3 x 3
  node1 node2 value
  <chr> <chr> <dbl>
1 a     b         2
2 c     f         5
3 d     g         7
  • Related