Home > Blockchain >  Merging data tables in r with multiple conditions
Merging data tables in r with multiple conditions

Time:05-28

I have two rather large datasets. One had 2.8 million observations and 43 variables, the other 66000 observations with 170 variables.

I want to merge them based on condition 1 (same company) and condition 2 (same year) and solely keep the matched observations. The final amount of observations is not important, all variables should be kept.

I think the problem might be that data 1 has multiple observations per year and data2 does not. However, every observation should get the value from data2 for that year.

data3<-inner_join(data1, data2, by=c("company", "year")) ->this code gives the vector memory exhaused code. I've upped the memory R uses tot 14GB and it changes nothing.

CodePudding user response:

I would use data.table for this. Something like:

library(data.table)
setDT(data1)
setDT(data2)
data3 <- data1[data2, on=.(company, year), nomatch=FALSE]
  •  Tags:  
  • r
  • Related