Home > Software engineering >  data frame de duplication
data frame de duplication

Time:04-25

I have a data frame. You can see that some rows just differs in the order "A"-"B" and "B"-"A" and these two rows have the same Value

df <- tibble(
  V1 = c("A", "C", "B","D"),
  V2 = c("B", "D", "A","C"), 
  Value = c(1,2,1,2)
)
  V1    V2    Value
  <chr> <chr> <dbl>
1 A     B         1
2 C     D         2
3 B     A         1
4 D     C         2

I want to remove one duplicated rows 0 or 2, to make it like below

  V1 V2 Value
0  A  B  1
1  C  D  2

How can I remove those repetitive rows?

CodePudding user response:

df[!duplicated(t(apply(df,1,sort))),]
  V1 V2 Value
0  A  B     1
1  C  D     2

or even:

df[!duplicated(cbind(pmax(df$V1, df$V2), pmin(df$V1, df$V2))),]
  V1 V2 Value
0  A  B     1
1  C  D     2

CodePudding user response:

An option with tidyverse

library(dplyr)
library(stringr)
library(purrr)
df %>% 
 filter(!duplicated(pmap_chr(across(V1:V2), ~ str_c(sort(c(...)), 
 collapse = ""))))
# A tibble: 2 × 3
  V1    V2    Value
  <chr> <chr> <dbl>
1 A     B         1
2 C     D         2
  • Related