I have a data frame and it looks something like the first df below. Theres duplicates in col1 but not col2. I want to remove all of the duplicate rows except the first row so that it looks like the second df below.
col1 | col2 |
---|---|
x | 1 |
x | 2 |
x | 3 |
y | 1 |
y | 2 |
y | 3 |
col1 | col2 |
---|---|
x | 1 |
y | 1 |
I tried this but it didn't work:
df %>% group_by(col1) %>% filter(duplicated(col1) | n()!=1)
CodePudding user response:
We need just distinct
library(dplyr)
distinct(df, col1, .keep_all = TRUE)
col1 col2
1 x 1
2 y 1
Or if we want to use duplicated
, negate (!
) and return the first row
df %>%
filter(!duplicated(col1))
col1 col2
1 x 1
2 y 1
data
df <- structure(list(col1 = c("x", "x", "x", "y", "y", "y"), col2 = c(1L,
2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
Why not:
df[ !duplicated( df$col1) , ]