Home > Net >  tidy way to remove duplicates per row
tidy way to remove duplicates per row

Time:06-14

I've seen different solutions to remove rowwise duplicates with base R solutions, e.g. R - find all duplicates in row and replace.

However, I'm wondering if there's amore tidy way. I tried several ways of using across or a combination of rowwise with c_across, but can't get it work.

df <- data.frame(x = c(1, 2, 3, 4),
                 y = c(1, 3, 4, 5),
                 z = c(2, 3, 5, 6))

Expected output:

  x  y  z
1 1 NA  2
2 2  3 NA
3 3  4  5
4 4  5  6

My ideas so far (not working):

df |> 
  mutate(apply(across(everything()), 1, function(x) replace(x, duplicated(x), NA)))

df |> 
  mutate(apply(across(everything()), 1, function(x) {x[duplicated(x)] <- NA}))

I got somewhat along the way by creating a list column that contains the column positions of the duplicates (but it also has the ugly warning about the usual "new names" problem. I'm unsure how to proceed from there (if that's a promising way), i.e. I guess it requires some form of purrr magic?

df |> 
  rowwise() |> 
  mutate(test = list(duplicated(c_across(everything())))) |> 
  unnest_wider(test)

# A tibble: 4 × 6
      x     y     z ...1  ...2  ...3 
  <dbl> <dbl> <dbl> <lgl> <lgl> <lgl>
1     1     1     2 FALSE TRUE  FALSE
2     2     3     3 FALSE FALSE TRUE 
3     3     4     5 FALSE FALSE FALSE
4     4     5     6 FALSE FALSE FALSE

CodePudding user response:

Maybe you want something like this:

library(dplyr)
df %>%
  rowwise() %>% 
  do(data.frame(replace(., duplicated(unlist(.)), NA)))

Output:

# A tibble: 4 × 3
# Rowwise: 
      x     y     z
  <dbl> <dbl> <dbl>
1     1    NA     2
2     2     3    NA
3     3     4     5
4     4     5     6

CodePudding user response:

Just for completeness, after trialing & erroring a bit, I also got the same result as provided by @Quinten, just in a much, much uglier way!

df |>
    rowwise() |> 
    mutate(pos = list(which(duplicated(c_across(everything()))))) |> 
    mutate(across(-pos, ~ ifelse(which(names(df) == cur_column()) %in% unlist(pos), NA, .))) |> 
    select(-pos)
  • Related