Home > Blockchain >  Merging two Dataframes row wise with Duplicates in R
Merging two Dataframes row wise with Duplicates in R

Time:10-02

I have a dataframe with duplicate rows. I want to merge with another dataframe such that each duplicate row is accounted for.

For Example,

df1:

   A    B    C
1  100  200  300
2  400  500  600
3  100  200  300

df2:

   A    B    C
1  100  200  300
2  600  700  800

After Merge: It should return

   A    B    C
1  100  200  300

But when I did:

merge(x=df1, y=df2)

I get

   A    B    C
1  100  200  300
2  100  200  300

Its returning two rows despite df2 having that row only once

CodePudding user response:

A contemporary tidyverse option.

library(dplyr)

df1 %>%
  bind_rows(df2) %>%
  filter(duplicated(.)) %>%
  distinct(across(everything()))
  
#     A   B   C
# 1 100 200 300

Data

df1 <- structure(list(A = c(100L, 400L, 100L), B = c(200L, 500L, 200L
), C = c(300L, 600L, 300L)), class = "data.frame", row.names = c(NA, 
                                                                 -3L))

df2 <- structure(list(A = c(100L, 600L), B = c(200L, 700L), C = c(300L, 
                                                                  800L)), class = "data.frame", row.names = c(NA, -2L))

CodePudding user response:

An option is match after pasteing the columns together

 df1[match(do.call(paste, df2), do.call(paste, df1), nomatch = 0),]
    A   B   C
1 100 200 300

data

df1 <- structure(list(A = c(100L, 400L, 100L), B = c(200L, 500L, 200L
), C = c(300L, 600L, 300L)), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(A = c(100L, 600L), B = c(200L, 700L), C = c(300L, 
800L)), class = "data.frame", row.names = c(NA, -2L))

CodePudding user response:

Maybe we can use intersect

> intersect(df1,df2)
    A   B   C
1 100 200 300
  • Related