Home > other >  Is there a way to only keep duplicate rows in a dataframe in R?
Is there a way to only keep duplicate rows in a dataframe in R?

Time:10-26

I have a dataframe with 91 variables. I am trying to extract only the rows where every single value in the row is a duplicate with another. I can use the unique function or distinct function to see that there are 233 rows that are duplicates. I want to create a dataframe with these 233 records. Most of the answers I have seen regarding similar issues focuses on finding the duplicate values via some sort of ID variable however my data does not have any such variable. I want to look at the entire row as a whole and not just one of the variables. How to I go about creating a dataframe of just those duplicate rows?

CodePudding user response:

You can subset on duplicated:

data[duplicated(data),]
  ID var1 var2
2  1    1    1

or in dplyr:

data %>%
  filter(duplicated(.))

Toy data:

data <- data.frame(ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4),
                   var1 = c(1, 1, 2, 5, 10, NA, 5, 23, NA, NA, 1),
                   var2 = c(1, 1, NA, NA, 1, NA, 0, 1, 3, 23, 4))
  •  Tags:  
  • r
  • Related