Home > front end >  How can you convert duplicates across multiple columns to be NA in R?
How can you convert duplicates across multiple columns to be NA in R?

Time:03-29

I have a dataset that I want to convert any duplicates across columns to be NA. I've found answers to help with just looking for duplicates in one column, and I've found ways to remove duplicates entirely (e.g., distinct()). Instead, I have this data:

library(dpylr)
test <- tibble(job = c(1:6), 
               name = c("j", "j", "j", "c", "c", "c"),
               id = c(1, 1, 2, 1, 5, 1))

And want this result:

library(dpylr)

answer <- tibble(job = c(1:6), 
                 id = c("j", NA, "j", "c", NA, "c"),
                  name = c(1, NA, 2, 1, NA, 5))

And I've tried a solution like this using duplicated(), but it fails:

#Attempted solution
library(dpylr)
test %>%
  mutate_at(vars(id, name), ~case_when(
    duplicated(id, name) ~ NA,
    TRUE ~ .
  ))

I'd prefer to use tidy solutions, but I can be flexible as long as the answer can be piped.

CodePudding user response:

We could create a helper and then identify duplicates and replace them with NA in an ifelse statement using across:

library(dplyr)
test %>% 
  mutate(helper = paste(id, name)) %>% 
  mutate(across(c(name, id), ~ifelse(duplicated(helper), NA, .)), .keep="unused")

    job name     id
  <int> <chr> <dbl>
1     1 j         1
2     2 NA       NA
3     3 j         2
4     4 c         1
5     5 c         5
6     6 NA       NA

CodePudding user response:

If we want to convert to NA, create a column that includes all the columns with paste or unite and then mutate with across

library(dplyr)
library(tidyr)
test %>% 
  unite(full_nm, -job, remove = FALSE) %>% 
  mutate(across(-c(job, full_nm), ~ replace(.x, duplicated(full_nm), NA))) %>%
  select(-full_nm)

-output

# A tibble: 6 × 3
    job name     id
  <int> <chr> <dbl>
1     1 j         1
2     2 <NA>     NA
3     3 j         2
4     4 c         1
5     5 c         5
6     6 <NA>     NA

  • Related