Home > front end >  Join by column and similar date
Join by column and similar date

Time:11-09

I would like to union this data frame but they have particular request. The inner join must be done by column two and by one (date), but the date are not the same. In the first data.frame the date must need before the date in data.frame b. I have charge also the result that I would like to have, it is possible to do in R?

a <- data.frame(one = c( as.Date( "2020-08-24"), as.Date( "2020-08-27" ), as.Date( "2020-08-31" ), as.Date( "2020-09-01" )),
                 two = c("a","b","b","a"))

b <- data.frame(two = c( as.Date( "2020-08-25"), as.Date( "2020-08-30" ), as.Date( "2020-09-05" ), as.Date( "2020-09-11" )),
                 three = c("a","b","a","b"))

result <- data.frame(one = c(as.Date( "2020-08-24"), as.Date( "2020-08-27" ), as.Date( "2020-08-31" ), as.Date( "2020-09-01" )),
two = c("a","b","b","a"), 
three = c(as.Date("2020-08-25"), as.Date( "2020-08-30"), as.Date("2020-09-11"), as.Date("2020-09-05")))

CodePudding user response:

In dplyr:

library(dplyr)
left_join(a, b, by = c("two" = "three")) %>% 
  filter(two.y > one) %>% 
  group_by(one) %>% 
  slice_min(two.y)

If you use the dplyr development version, there is a much more straightforward answer, using join_by in left_join:

devtools::install_github("tidyverse/dplyr")
library(dplyr)
left_join(a, b, join_by(closest(one <= two), two == three))

output

# A tibble: 4 × 3
# Groups:   one [4]
  one        two   two.y     
  <date>     <chr> <date>    
1 2020-08-24 a     2020-08-25
2 2020-08-27 b     2020-08-30
3 2020-08-31 b     2020-09-11
4 2020-09-01 a     2020-09-05
  •  Tags:  
  • r
  • Related