Home > Back-end >  Join two dataframes in R which only share some similar rows R
Join two dataframes in R which only share some similar rows R

Time:09-26

I have the following two similar dataframes:

d1<-data.frame(TNR=c(1,2,3,4,5,6,7),EP=c(0,3,1,2,NA,NA,NA))
d2<-data.frame(TNR=c("E",1,3,5,100), EP=c(NA,NA,NA,NA,NA))

In d2 the vector "EP" is empty and i want to fill it with the data from d1$EP, where the vectors TNR overlap with each other. Where they don't overlap I want to keep d2 and not d1.

The result should look something like that:

d3<-data.frame(TNR=c("E",1,3,5,100),EP=c(NA,0,1,NA,NA))

I tried the following approach with ifelse and %in%:

d3<-data.frame(TNR=ifelse(test=d1$TNR %in% d2$TNR, yes=d2$EP, no=NA),
               EP=ifelse(test=d1$TNR %in% d2$TNR, yes=d1$EP, no="")) 
#remove rows where TNR is NA

d4=d3[!is.na(d1$baumnummer),]

However, the result is that EP is inserted in the wrong order and it seems to delete more rows than I anticipated because the resulting dataframe is shorter than d2

CodePudding user response:

Is that OK?

d2$EP <- d1[match(x = d2$TNR, table = d1$TNR), ]$EP

CodePudding user response:

Is this what you mean?

> d1 <- data.frame(TNR=as.character(c(1,2,3,4,5,6,7)),EP=c(0,3,1,2,NA,NA,NA))
> d2 <- data.frame(TNR=c('E',1,3,5,100))
> d3 <- merge(d1, d2, by = 'TNR', all = TRUE)
> d3
  TNR EP
1   1  0
2 100 NA
3   2  3
4   3  1
5   4  2
6   5 NA
7   6 NA
8   7 NA
9   E NA

To sort you can use order(). There are alternatives to base merge perhaps with slightly different behavior: data.table::merge, dplyr:join_*

CodePudding user response:

We can do it with dplyr. First convert the TNR column to character, then right_join the two dataframes. Finally, we can use coalesce to merge the EP column, eliminating NAs.

library(dplyr)

d1 %>%
    mutate(TNR = as.character(TNR)) %>%
    right_join(d2, by = "TNR") %>%
    mutate(EP = coalesce(EP.x, EP.y), .keep = 'unused')

  TNR EP
1   1  0
2   3  1
3   5 NA
4   E NA
5 100 NA
  • Related