I have two rather large datasets. One had 2.8 million observations and 43 variables, the other 66000 observations with 170 variables.
I want to merge them based on condition 1 (same company) and condition 2 (same year) and solely keep the matched observations. The final amount of observations is not important, all variables should be kept.
I think the problem might be that data 1 has multiple observations per year and data2 does not. However, every observation should get the value from data2 for that year.
data3<-inner_join(data1, data2, by=c("company", "year"))
->this code gives the vector memory exhaused code. I've upped the memory R uses tot 14GB and it changes nothing.
CodePudding user response:
I would use data.table
for this. Something like:
library(data.table)
setDT(data1)
setDT(data2)
data3 <- data1[data2, on=.(company, year), nomatch=FALSE]