I have the following data frame:
df <- data.frame(node1=c("a","b", "c","d"), node2=c("b","a","f","g"),value=c(2,2,5,7))
I want to remove rows where node1 and node2 contain the same letters (Regardless of order) and same value so the resulting df should look like:
df <- data.frame(node1=c("a","c","d"), node2=c("b","f","g"), value=c(2,5,7))
Please help, thank you.
CodePudding user response:
Here is another option using dplyr
and textshape
packages.
Packages
library(dplyr)
library(textshape)
First, create two new variables that paste "node1" and "value," and "node2" and "value".
Then, use
unique_pairs()
to find the distinct combinations between the new columns you created, regardless of the order.Delete the unnecessary columns.
Solution
df %>%
mutate(nv1 = paste0(node1, value),
nv2 = paste0(node2, value)) %>%
unique_pairs("nv1", "nv2") %>%
select(-nv1, -nv2)
Output
node1 node2 value
1 a b 2
3 c f 5
4 d g 7
ps: "nv1" and "nv2" stand for "new value based on node1" and "new value based on node2." It could be whatever.
CodePudding user response:
This uses an environment
to keep track of already seen node1
, node2
and value
combinations:
library(dplyr)
library(purrr)
df |> filter({
M <- new.env()
key <- purrr::map2(node1, node2, c) |>
purrr::map(sort) |>
purrr::map(~ paste(.x,collapse = "\t"))
purrr::map2_lgl(key, value, ~{
if (is.null(M[[.x]])) {
M[[.x]] <<- list()
}
res <- ! .y %in% M[[.x]]
M[[.x]] <<- union(M[[.x]], .y)
res
})
})
# A tibble: 3 x 3
node1 node2 value
<chr> <chr> <dbl>
1 a b 2
2 c f 5
3 d g 7