I have two data frames, a price dataset and a survey dataset, that I need to join on the closest date. The dates don't exactly match, and I need the price dataset to match on the closest date in the survey dataset. A previous question that I asked here on this was very helpful in giving me a an answer as to how to match on the closest date. However, I've since realized that I also need to join the data sets on region, as well as the closest date. I know how to join on closest date, I know how to join on region, but I unfortunately do not know how to join on the closest date and the exact same region.
Here's some example code and what I've been trying
#generating sample dataset
set.seed(1234)
price <- data.frame(region = sample(LETTERS[1:3],15,replace = TRUE), price = rnorm(15),price_date=sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 15))
survey <- data.frame(region = sample(LETTERS[1:3],15,replace = TRUE), survey_date=sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 15))
#this works to join on the closest date, however, the regions don't match, which is a problem
library(data.table)
setDT(price)
setDT(survey)
join <- price[survey,on=.(price_date=survey_date),roll="nearest"]
Basically what I want to do is match the date/region observation from price
to the closest date, same region observation in survey
, and I'm not really sure as to how to do this. Would appreciate any help.
CodePudding user response:
You could add region to on
condition:
price[survey,.(region,price,x.price_date,survey_date),on=.(region,price_date=survey_date),roll="nearest"][]
region price x.price_date survey_date
<char> <num> <Date> <Date>
1: A 0.88010416 1999-10-26 1999-09-05
2: B 0.31026217 1999-12-25 1999-12-31
3: A -1.68732684 1999-04-27 1999-06-02
4: C 0.00500695 1999-08-09 1999-05-11
5: C 0.00500695 1999-08-09 1999-03-24
6: B -0.03763026 1999-08-12 1999-09-02
7: C -0.64701901 1999-12-24 2000-01-01
8: B -0.03763026 1999-08-12 1999-08-06
9: C 0.00500695 1999-08-09 1999-08-03
10: A 0.88010416 1999-10-26 1999-11-22
11: C 1.37001035 1999-09-15 1999-10-03
12: B 0.01831663 1999-07-01 1999-06-18
13: A -0.62743621 1999-03-20 1999-03-12
14: B 0.72397606 1999-02-18 1999-03-02
15: C -0.64701901 1999-12-24 1999-12-18
Note use of x.
to display LHS date.