Home > Enterprise >  Remove rows that have a duplicate in R
Remove rows that have a duplicate in R

Time:12-22

I have

a <- c(rep("A", 3), rep("B", 3), rep("C",2), rep("D", 1))
b <- c(1,1,2,4,1,1,2,2,5)
df <-data.frame(a,b)

Based on df$a, i would like to return only the values that do not have a duplicate (those rows that have a single occurence of df$a), in this example it would be 1 D 5

I have tried duplicate(), !duplicate() and unique() but none outputs what I need.

CodePudding user response:

One option

df[!(df$a %in% df$a[duplicated(df$a)]),]

  a b
9 D 5

CodePudding user response:

Cleanest way with dplyr:

library(dplyr)

df %>% group_by(a) %>%
   filter(n() == 1)

Output:

# A tibble: 1 x 2
# Groups:   a [1]
  a         b
  <chr> <dbl>
1 D         5

CodePudding user response:

Using data.table

library(data.table)
setDT(df)
df[, tmp:= .N, by = a][tmp == 1, -"tmp"]
   a b
1: D 5

CodePudding user response:

With Base R ,

x <- table(df[,1])

df[rep(x<2,x),]

gives,

#   a b
# 9 D 5
  • Related