Home > Software engineering >  Only leave duplicated rows in a dataframe, with R
Only leave duplicated rows in a dataframe, with R

Time:01-28

I have a dataframe that looks like this:

col1 col2 col3
tn1 a b
tn1 a c
tn2 d b
tn3 a b

And I want to leave only those rows that are duplicated for col1 & col2, keeping BOTH rows:

col1 col2 col3
tn1 a b
tn1 a c

I've been trying to do this by using unique() or distinct() or anti_join() but can't figure it out.

CodePudding user response:

Base R:

df[df$col1 %in% df$col1[duplicated(df$col1)],]
  col1 col2 col3
1  tn1    a    b
2  tn1    a    c

CodePudding user response:

Found this and worked

df %>% group_by(col1) %>% filter((duplicated(col2) | duplicated(col2, fromLast = T)))

  • Related