Home > Back-end >  Delete all but first instance in data frame when the rows aren't duplicates in R [duplicate]
Delete all but first instance in data frame when the rows aren't duplicates in R [duplicate]

Time:10-06

I have a data frame and it looks something like the first df below. Theres duplicates in col1 but not col2. I want to remove all of the duplicate rows except the first row so that it looks like the second df below.

col1 col2
x 1
x 2
x 3
y 1
y 2
y 3
col1 col2
x 1
y 1

I tried this but it didn't work:

df %>% group_by(col1) %>% filter(duplicated(col1) | n()!=1)

CodePudding user response:

We need just distinct

library(dplyr)
distinct(df, col1, .keep_all = TRUE)
  col1 col2
1    x    1
2    y    1

Or if we want to use duplicated, negate (!) and return the first row

df %>%
    filter(!duplicated(col1))
  col1 col2
1    x    1
2    y    1

data

df <- structure(list(col1 = c("x", "x", "x", "y", "y", "y"), col2 = c(1L, 
2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

Why not:

df[ !duplicated( df$col1) , ]
  • Related