I have a dataframe with duplicate rows. I want to merge with another dataframe such that each duplicate row is accounted for.
For Example,
df1:
A B C
1 100 200 300
2 400 500 600
3 100 200 300
df2:
A B C
1 100 200 300
2 600 700 800
After Merge: It should return
A B C
1 100 200 300
But when I did:
merge(x=df1, y=df2)
I get
A B C
1 100 200 300
2 100 200 300
Its returning two rows despite df2 having that row only once
CodePudding user response:
A contemporary tidyverse option.
library(dplyr)
df1 %>%
bind_rows(df2) %>%
filter(duplicated(.)) %>%
distinct(across(everything()))
# A B C
# 1 100 200 300
Data
df1 <- structure(list(A = c(100L, 400L, 100L), B = c(200L, 500L, 200L
), C = c(300L, 600L, 300L)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(A = c(100L, 600L), B = c(200L, 700L), C = c(300L,
800L)), class = "data.frame", row.names = c(NA, -2L))
CodePudding user response:
An option is match
after paste
ing the columns together
df1[match(do.call(paste, df2), do.call(paste, df1), nomatch = 0),]
A B C
1 100 200 300
data
df1 <- structure(list(A = c(100L, 400L, 100L), B = c(200L, 500L, 200L
), C = c(300L, 600L, 300L)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(A = c(100L, 600L), B = c(200L, 700L), C = c(300L,
800L)), class = "data.frame", row.names = c(NA, -2L))
CodePudding user response:
Maybe we can use intersect
> intersect(df1,df2)
A B C
1 100 200 300