Home > OS >  How to replace distinct values in a column if the value in another column is a duplicate in R?
How to replace distinct values in a column if the value in another column is a duplicate in R?

Time:10-15

I want to replace distinct values in the 'Grade' column with NA if the values in the 'ID' column are duplicates.

This is my data frame currently:

ID            Name            Grade
1001          Mary            10
1002          John            9
1002          John            10
1003          James           12

And this is what I want the data frame to look like:

ID            Name            Grade
1001          Mary            10
1002          John            NA
1002          John            NA
1003          James           12

How would I go about accomplishing this?

Thanks!

CodePudding user response:

You may try

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Grade = ifelse(n()>1, NA, Grade))

     ID Name  Grade
  <int> <chr> <int>
1  1001 Mary     10
2  1002 John     NA
3  1002 John     NA
4  1003 James    12

CodePudding user response:

Here are couple of base R option -

  1. Using duplicated.
df$Grade[duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE)] <- NA
df

#    ID  Name Grade
#1 1001  Mary    10
#2 1002  John    NA
#3 1002  John    NA
#4 1003 James    12
  1. Using table.
df$Grade[df$ID %in% names(Filter(function(x) x > 1, table(df$ID)))] <- NA

You can also use dplyr for 1.

library(dplyr)

df <- df %>% 
       mutate(Grade = replace(Grade, duplicated(ID) | 
                              duplicated(ID, fromLast = TRUE), NA))
df
  • Related