I have a dataset that I want to convert any duplicates across columns to be NA. I've found answers to help with just looking for duplicates in one column, and I've found ways to remove duplicates entirely (e.g., distinct()). Instead, I have this data:
library(dpylr)
test <- tibble(job = c(1:6),
name = c("j", "j", "j", "c", "c", "c"),
id = c(1, 1, 2, 1, 5, 1))
And want this result:
library(dpylr)
answer <- tibble(job = c(1:6),
id = c("j", NA, "j", "c", NA, "c"),
name = c(1, NA, 2, 1, NA, 5))
And I've tried a solution like this using duplicated(), but it fails:
#Attempted solution
library(dpylr)
test %>%
mutate_at(vars(id, name), ~case_when(
duplicated(id, name) ~ NA,
TRUE ~ .
))
I'd prefer to use tidy solutions, but I can be flexible as long as the answer can be piped.
CodePudding user response:
We could create a helper
and then identify duplicates and replace them with NA
in an ifelse
statement using across
:
library(dplyr)
test %>%
mutate(helper = paste(id, name)) %>%
mutate(across(c(name, id), ~ifelse(duplicated(helper), NA, .)), .keep="unused")
job name id
<int> <chr> <dbl>
1 1 j 1
2 2 NA NA
3 3 j 2
4 4 c 1
5 5 c 5
6 6 NA NA
CodePudding user response:
If we want to convert to NA
, create a column that includes all the columns with paste
or unite
and then mutate
with across
library(dplyr)
library(tidyr)
test %>%
unite(full_nm, -job, remove = FALSE) %>%
mutate(across(-c(job, full_nm), ~ replace(.x, duplicated(full_nm), NA))) %>%
select(-full_nm)
-output
# A tibble: 6 × 3
job name id
<int> <chr> <dbl>
1 1 j 1
2 2 <NA> NA
3 3 j 2
4 4 c 1
5 5 c 5
6 6 <NA> NA