Home > Blockchain >  How to find duplicated values in two columns between two dataframes and remove non-duplicates in R?
How to find duplicated values in two columns between two dataframes and remove non-duplicates in R?

Time:02-02

So let's say I have two dataframes that look like this

df1 <- data.frame(ID = c("A","B","F","G","B","B","A","G","G","F","A","A","A","B","F"),
                 code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
                 class =  c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))

df2 <- data.frame(ID = c("G","F","C","F","B","A","F","C","A","B","A","B","C","A","G"),
                 code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
                 class =  c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))

I want to check the duplicates in df1$ID and df2$ID and remove all the rows from df2 if the IDs are not present in df1 so the new dataframe would look like this:

df3 <- data.frame(ID = c("G","F","F","B","A","F","A","B","A","B","A","G"),
                 code = c(1,2,3,3,1,2,1,1,3,2,1,1),
                 class =  c(2,4,5,2,3,2,1,2,4,5,2,1)) 

CodePudding user response:

With %in%:

df2[df2$ID %in% df1$ID, ]

   ID code class
1   G    1     2
2   F    2     4
4   F    3     5
5   B    3     2
6   A    1     3
7   F    2     2
9   A    1     1
10  B    1     2
11  A    3     4
12  B    2     5
14  A    1     2
15  G    1     1

CodePudding user response:

You can use the 'intersect' function to tackle the issue.

common_ids <- intersect(df1$ID, df2$ID)
df3 <- df2[df2$ID %in% common_ids, ]

ID code class
1   G    1     2
2   F    2     4
4   F    3     5
5   B    3     2
6   A    1     3
7   F    2     2
9   A    1     1
10  B    1     2
11  A    3     4
12  B    2     5
14  A    1     2
15  G    1     1
  •  Tags:  
  • r
  • Related