Home > Software design >  How can I remove duplicate rows from a data frame when the data is the same, but in a different orde
How can I remove duplicate rows from a data frame when the data is the same, but in a different orde

Time:02-11

I have the following data frame:

df <- data.frame(node1=c("a","b", "c","d"), node2=c("b","a","f","g"),value=c(2,2,5,7))

I want to remove rows where node1 and node2 contain the same letters (Regardless of order) and same value so the resulting df should look like:

df <- data.frame(node1=c("a","c","d"), node2=c("b","f","g"), value=c(2,5,7))

Please help, thank you.

CodePudding user response:

Here is another option using dplyr and textshape packages.

Packages

library(dplyr)
library(textshape)
  1. First, create two new variables that paste "node1" and "value," and "node2" and "value".

  2. Then, use unique_pairs() to find the distinct combinations between the new columns you created, regardless of the order.

  3. Delete the unnecessary columns.

Solution

df %>% 
  mutate(nv1 = paste0(node1, value),
         nv2 = paste0(node2, value)) %>% 
  unique_pairs("nv1", "nv2") %>% 
  select(-nv1, -nv2)

Output

   node1 node2 value
 1     a     b     2
 3     c     f     5
 4     d     g     7

ps: "nv1" and "nv2" stand for "new value based on node1" and "new value based on node2." It could be whatever.

CodePudding user response:

This uses an environment to keep track of already seen node1, node2 and value combinations:

library(dplyr)
library(purrr)

df |> filter({
    M <- new.env()

    key <- purrr::map2(node1, node2, c) |>
        purrr::map(sort) |>
        purrr::map(~ paste(.x,collapse = "\t"))

    purrr::map2_lgl(key, value, ~{
        if (is.null(M[[.x]])) {
            M[[.x]] <<- list()
        }
        res <- ! .y %in% M[[.x]]
        M[[.x]] <<- union(M[[.x]], .y)
        res
    })
})



# A tibble: 3 x 3
  node1 node2 value
  <chr> <chr> <dbl>
1 a     b         2
2 c     f         5
3 d     g         7
  • Related