I have two dataframes insitu
and model
:
dput(head(insitu,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR",
"LAM", "LAM", "LAM", "LAM", "LAM", "LAM"), D_SOS = structure(c(16929,
17149, 17422, 17850, 18389, 18202, 17044, 16744, 17300, 17522,
18027, 18198), class = "Date"), D_EOS = structure(c(17067, 17353,
17712, 18082, 18516, 18360, 17123, 17002, 17414, 17722, 18148,
18446), class = "Date")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
dput(head(model,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR",
"AUR", "AUR", "LAM", "LAM", "LAM", "LAM", "LAM", "LAM", "LAM"
), EVI_SOS = structure(c(16934, 17137, 17378, 17605, 17862, 18003,
18192, 18395, 16744, 17134, 17278, 17518, 17725, 18004, 18200
), class = "Date"), EVI_EOS = structure(c(17074, 17361, 17591,
17798, 17994, 18096, 18376, 18594, 17106, 17252, 17431, 17705,
17862, 18173, 18549), class = "Date")), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))
What I want to do is the following:
I want to find the best matching dates between two dataframes on the selected columns and corresponding rows. In other words, in the dataframe insitu
in the column D_SOS which intercept with the rows of AUR of the column ID which dates best match with the column EVI_SOS which intercept the rows AUR of the column ID from the dataframe model
. The same will have to be made on the LAM rows.
The desired output example would be:
dput(head(output,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR",
"LAM", "LAM", "LAM", "LAM", "LAM", "LAM"), D_SOS = structure(c(16929,
17149, 17422, 17850, 18389, 18202, 17044, 16744, 17300, 17522,
18027, 18198), class = "Date"), EVI_SOS = structure(c(16934,
17137, 17378, 17862, 18395, 18192, 17134, 16744, 17278, 17518,
18004, 18200), class = "Date"), D_EOS = structure(c(17067, 17353,
17712, 18082, 18516, 18360, 17123, 17002, 17414, 17722, 18148,
18446), class = "Date"), EVI_EOS = structure(c(17074, 17361,
17798, 18096, 18594, 18376, 17252, 17106, 17431, 17705, 18173,
18549), class = "Date")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
It will look like this:
ID D_SOS EVI_SOS D_EOS EVI_EOS
1 AUR 2016-05-08 2016-05-13 2016-09-23 2016-09-30
2 AUR 2016-12-14 2016-12-02 2017-07-06 2017-07-14
3 AUR 2017-09-13 2017-07-31 2018-06-30 2018-09-24
4 AUR 2018-11-15 2018-11-27 2019-07-05 2019-07-19
5 AUR 2020-05-17 2020-05-13 2020-09-11 2020-11-28
6 AUR 2019-11-02 2019-10-23 2020-04-08 2020-04-24
7 LAM 2016-08-31 2016-11-29 2016-11-18 2017-03-27
8 LAM 2015-11-05 2015-11-05 2016-07-20 2016-11-01
9 LAM 2017-05-14 2017-04-22 2017-09-05 2017-09-22
10 LAM 2017-12-22 2017-12-18 2018-07-10 2018-06-23
11 LAM 2019-05-11 2019-04-18 2019-09-09 2019-10-04
12 LAM 2019-10-29 2019-10-31 2020-07-03 2020-10-14
Basically, over the 8 AUR dates of model
dataframe 6 will match since insitu
only has 6 dates in AUR. For LAM model
dataframe has 7 dates but insitu
has 6 dates in LAM which will be the number to match.
The output would have first the column of insitu
, for example D_SOS and then the one of model
, for example EVI_SOS which match with the correspondent dates.
Any help will be much appreciatted.
CodePudding user response:
library(data.table)
setDT(insitu)
setDT(model)
insitu[, key := D_SOS]
model[, key := EVI_SOS]
setkey(insitu, ID, key)
setkey(model, ID, key)
model[insitu, roll = "nearest"][, .(ID, D_SOS, EVI_SOS, D_EOS, EVI_EOS)]
# ID D_SOS EVI_SOS D_EOS EVI_EOS
# 1: AUR 2016-05-08 2016-05-13 2016-09-23 2016-09-30
# 2: AUR 2016-12-14 2016-12-02 2017-07-06 2017-07-14
# 3: AUR 2017-09-13 2017-07-31 2018-06-30 2018-03-01
# 4: AUR 2018-11-15 2018-11-27 2019-07-05 2019-04-08
# 5: AUR 2019-11-02 2019-10-23 2020-04-08 2020-04-24
# 6: AUR 2020-05-07 2020-05-13 2020-09-11 2020-11-28
# 7: LAM 2015-11-05 2015-11-05 2016-07-20 2016-11-01
# 8: LAM 2016-08-31 2016-11-29 2016-11-18 2017-03-27
# 9: LAM 2017-05-14 2017-04-22 2017-09-05 2017-09-22
# 10: LAM 2017-12-22 2017-12-18 2018-07-10 2018-06-23
# 11: LAM 2019-05-11 2019-04-18 2019-09-09 2019-10-04
# 12: LAM 2019-10-29 2019-10-31 2020-07-03 2020-10-14