I have two data frames I would like to merge
a<- data.frame(x=c(1,4,6,8,1,6,7,2),ID=c("132","14.","732","2..","132","14.","732","2.."),year=c(1,1,1,1,2,2,2,2))
b<- data.frame(y=c(2,7,5,5,1,1,2,3),ID=c("132","144","732","290","132","144","732","290"),year=c(1,1,1,1,2,2,2,2))
The ID Variable by which I would like to merge the two data frames is not completely known in data set a. I also want to merge by year. They are known up to a fully identifying regular expression. Note, there is a one-to-one match. In this example you would not find an ID "1.." in the data set so there are no ambiguous matches.
I would like to get something like this:
output<-data.frame(y=c(2,7,5,5,1,1,2,3),x=c(1,4,6,8,1,6,7,2),ID=c("132","144","732","290","132","144","732","290"), year=c(1,1,1,1,2,2,2,2))
I tried to remove the regular expression part with substr and then use starts_with in the merge but it does not work.
I get the following error message
Coercing pattern to a plain character vector
when I do this:
df_complete <- regex_inner_join(b,a, by=c("ID","year"))
Thank you stack overflow...
CodePudding user response:
You might want to use the fuzzyjoin
package, and then you can use the regex_inner_join()
directly:
fuzzyjoin::regex_inner_join(b,a, by="ID") %>% select(x,y,ID=ID.x)
Output:
x y ID
1 1 2 132
2 4 7 144
3 6 5 732
4 8 5 290
CodePudding user response:
Answer from Comment by @jblood94
With a and b as data.tables: a[, regex_inner_join(b[year == .BY], .SD, by = "ID"), year] – jblood94