Home > front end >  Add data from one dataframe to another by ID criteria
Add data from one dataframe to another by ID criteria

Time:10-02

This is difficult to explain, but; I'd like to add to a dataframe information from another dataframe. This is information contained in a column from one dataframe, but matched by specific information shared by both rows. For example, trying to match sex to an individual by a ID number code:

# main dataframe
df.1 <- data.frame(ID = c(1,6,4,3),
               age = c(20,34,21,18))

# reference information 
df.2 <- data.frame(ID = c(1,6,4,3),
               sex = c("M","F","unknown","M"))

# combine by the ID criteria, the sex information: 
df.3 <- data.frame(ID = c(1,6,4,3),
               age = c(20,34,21,18),
               sex = c("M","F","unknown","M"))

But, dataframe 2, the reference dataframe, doesn't have the IDs in order, so I can't just copy in a column from it. And, the reference dataframe also has many more IDs, and repeats of rows containing the same IDs (the reference dataframe has 20 times the number of rows as the main dataframe) - meaning I need to pull out the only the right information to match the ID - and add it to the original dataframe by the criteria of the different IDs. So really, it's more like trying to join a reference dataframe like this:

df.2 <- dataframe(ID = c(1,1,1,12,12,6,6,6,3,3,3,17,17,17,4,4,4),
              sex = c("M","M","M","F","F","F","F","F",
                      "unknown","unknown","unknown",
                      "M","M","M","M","M","M"))

And still get the same results for df.3!

I tried various ways of sub-setting, but couldn't match the differing order of the IDs. I also tried various types of joining:

df.3 <- left_join(df.1, df.2, by ='ID')

But that just increased the size of the df.3, as is quite messy. I had a go at writing a if...else function - but my level of skills wasn't good enough. I tried ifelse too, but couldn't get that to work either. Run in a dead end really!

Any guidance would be appreciated, Thanks in advance

CodePudding user response:

You can simply use distinct on the df.2:

left_join(df.1, distinct(df.2))

CodePudding user response:

We can simply get the unique of the data and do a merge

merge(df.1, unique(df.2))
  • Related