I want to merge datasets in R, but I want to know which row succeeded to merge after the merge process.
In Stata, _merge
column is generated automatically after the merge process, and the column has 3 values, master only(1), using only(2), and matched(3)
, respectively. You can see the output screenshot here.
I think R also has this function, but it is hard to search.
CodePudding user response:
I'd add columns that allow the source to be identified
df1 <- data.frame(x=c("a","b","c"), y=c(1,2,3))
df2 <- data.frame(x=c("a","b","d"), z=c(1,2,NA))
# solution:
df1$in1 <- TRUE
df2$in2 <- TRUE
merge(df1, df2, all=TRUE)
To add the labels as your example
df3$source <- ifelse(df3$in1 & is.na(df3$in2), "master only",
ifelse(df3$in2 & is.na(df3$in1), "using only", "matched"))
df3$in1 <- NULL
df3$in2 <- NULL