Home > Software engineering >  r - check if value is unique in dataframe column
r - check if value is unique in dataframe column

Time:10-03

I am creating a new column (Flag) in my dataframe that identifies unique or duplicate numbers based on another column (Number). I have attempted to use the duplicated function, but this only identifies repeated values as duplicates, and leaves the original occurrence as not being a duplicate (as below in Number 100001):

Number    Flag
100000   unique
100001   unique
100002   unique
100001   duplicate
100003   unique

How do I let my if-else statement identify a value as duplicate if it occurs multiple times in the column, as below:

Number    Flag
100000   unique
100001   duplicate
100002   unique
100001   duplicate
100003   unique

CodePudding user response:

One way to do this is to group_by(Number) and check for each group if there is more than one observation n() > 1.

We can use that inside an ifelse statement to create said Flag.

library(dplyr)

dat <- tibble(Number = c(
        100000,
        100001, 
        100002, 
        100001, 
        100003))

dat %>% 
  group_by(Number) %>% 
  mutate(Flag = ifelse(n() > 1,
                       "duplicate",
                       "unique")) %>% 
  ungroup()

#> # A tibble: 5 × 2
#>   Number Flag     
#>    <dbl> <chr>    
#> 1 100000 unique   
#> 2 100001 duplicate
#> 3 100002 unique   
#> 4 100001 duplicate
#> 5 100003 unique

Created on 2022-10-02 by the reprex package (v0.3.0)

CodePudding user response:

unique() function is helpful to construct a unique sets, and then use the if...else... sentence to identify repeated value.

CodePudding user response:

Here is a base R approach. You can used duplicated but, also include a second duplicated where duplicates are also considered in reverse (using fromLast).

dat$Flag <- ifelse(
  duplicated(dat$Number) | duplicated(dat$Number, fromLast = TRUE),
  "duplicate",
  "unique"
)
  • Related