Home > OS >  R: How to remove duplicated entry across columns within each row
R: How to remove duplicated entry across columns within each row

Time:06-25

I have a dataframe that looks like the following. Within each row, I would like to remove entries in X1:n that are duplicate entries.

> df <- data.frame(ID = c("100", "101", "102"),
                   X1 = c("C23.2", "C23.2", "A79.1"), 
                   X2 = c("C23.2", NA, "A79.1"),
                   X3 = c("A19.2", NA, "A79.1"))

The output would look something like this

   ID    X2    X3    X4
1 100 C23.2 A19.2  <NA>
2 101 C23.2  <NA>  <NA>
3 102 A79.1  <NA>  <NA>

CodePudding user response:

Using pmap_dfr from purrr:

library(dplyr)
library(purrr)
df %>%
  pmap_dfr(., ~c(...) %>% replace(., duplicated(.), NA)) %>%
  bind_cols(select(df), .)

Output:

   ID    X1   X2    X3
1 100 C23.2 <NA> A19.2
2 101 C23.2 <NA>  <NA>
3 102 A79.1 <NA>  <NA>

CodePudding user response:

In base R, use apply to loop over the rows, extract the non-duplicated elements and readjust the length

df[-1] <- t(apply(df[-1], 1, \(x) `length<-`(x[!duplicated(x)], length(x))))

-output

> df
   ID    X1    X2   X3
1 100 C23.2 A19.2 <NA>
2 101 C23.2  <NA> <NA>
3 102 A79.1  <NA> <NA>
  • Related