This is difficult to explain, but; I'd like to add to a dataframe information from another dataframe. This is information contained in a column from one dataframe, but matched by specific information shared by both rows. For example, trying to match sex to an individual by a ID number code:
# main dataframe
df.1 <- data.frame(ID = c(1,6,4,3),
age = c(20,34,21,18))
# reference information
df.2 <- data.frame(ID = c(1,6,4,3),
sex = c("M","F","unknown","M"))
# combine by the ID criteria, the sex information:
df.3 <- data.frame(ID = c(1,6,4,3),
age = c(20,34,21,18),
sex = c("M","F","unknown","M"))
But, dataframe 2, the reference dataframe, doesn't have the IDs in order, so I can't just copy in a column from it. And, the reference dataframe also has many more IDs, and repeats of rows containing the same IDs (the reference dataframe has 20 times the number of rows as the main dataframe) - meaning I need to pull out the only the right information to match the ID - and add it to the original dataframe by the criteria of the different IDs. So really, it's more like trying to join a reference dataframe like this:
df.2 <- dataframe(ID = c(1,1,1,12,12,6,6,6,3,3,3,17,17,17,4,4,4),
sex = c("M","M","M","F","F","F","F","F",
"unknown","unknown","unknown",
"M","M","M","M","M","M"))
And still get the same results for df.3!
I tried various ways of sub-setting, but couldn't match the differing order of the IDs. I also tried various types of joining:
df.3 <- left_join(df.1, df.2, by ='ID')
But that just increased the size of the df.3, as is quite messy. I had a go at writing a if...else
function - but my level of skills wasn't good enough. I tried ifelse
too, but couldn't get that to work either.
Run in a dead end really!
Any guidance would be appreciated, Thanks in advance
CodePudding user response:
You can simply use distinct
on the df.2
:
left_join(df.1, distinct(df.2))
CodePudding user response:
We can simply get the unique
of the data and do a merge
merge(df.1, unique(df.2))