I have a dataframe that looks like the following. Within each row, I would like to remove entries in X1:n that are duplicate entries.
> df <- data.frame(ID = c("100", "101", "102"),
X1 = c("C23.2", "C23.2", "A79.1"),
X2 = c("C23.2", NA, "A79.1"),
X3 = c("A19.2", NA, "A79.1"))
The output would look something like this
ID X2 X3 X4
1 100 C23.2 A19.2 <NA>
2 101 C23.2 <NA> <NA>
3 102 A79.1 <NA> <NA>
CodePudding user response:
Using pmap_dfr
from purrr
:
library(dplyr)
library(purrr)
df %>%
pmap_dfr(., ~c(...) %>% replace(., duplicated(.), NA)) %>%
bind_cols(select(df), .)
Output:
ID X1 X2 X3
1 100 C23.2 <NA> A19.2
2 101 C23.2 <NA> <NA>
3 102 A79.1 <NA> <NA>
CodePudding user response:
In base R
, use apply
to loop over the rows, extract the non-duplicated elements and readjust the length
df[-1] <- t(apply(df[-1], 1, \(x) `length<-`(x[!duplicated(x)], length(x))))
-output
> df
ID X1 X2 X3
1 100 C23.2 A19.2 <NA>
2 101 C23.2 <NA> <NA>
3 102 A79.1 <NA> <NA>