I am creating a new column (Flag) in my dataframe that identifies unique or duplicate numbers based on another column (Number). I have attempted to use the duplicated function, but this only identifies repeated values as duplicates, and leaves the original occurrence as not being a duplicate (as below in Number 100001):
Number Flag
100000 unique
100001 unique
100002 unique
100001 duplicate
100003 unique
How do I let my if-else statement identify a value as duplicate if it occurs multiple times in the column, as below:
Number Flag
100000 unique
100001 duplicate
100002 unique
100001 duplicate
100003 unique
CodePudding user response:
One way to do this is to group_by(Number)
and check for each group if there is more than one observation n() > 1
.
We can use that inside an ifelse
statement to create said Flag
.
library(dplyr)
dat <- tibble(Number = c(
100000,
100001,
100002,
100001,
100003))
dat %>%
group_by(Number) %>%
mutate(Flag = ifelse(n() > 1,
"duplicate",
"unique")) %>%
ungroup()
#> # A tibble: 5 × 2
#> Number Flag
#> <dbl> <chr>
#> 1 100000 unique
#> 2 100001 duplicate
#> 3 100002 unique
#> 4 100001 duplicate
#> 5 100003 unique
Created on 2022-10-02 by the reprex package (v0.3.0)
CodePudding user response:
unique()
function is helpful to construct a unique sets, and then use the if...else...
sentence to identify repeated value.
CodePudding user response:
Here is a base R approach. You can used duplicated
but, also include a second duplicated
where duplicates are also considered in reverse (using fromLast).
dat$Flag <- ifelse(
duplicated(dat$Number) | duplicated(dat$Number, fromLast = TRUE),
"duplicate",
"unique"
)