How do you do rolling joins with database tables in R?-CodePudding

The dev version of dplyr allows for rolling joins.

# devtools::install_github("/tidyverse/dplyr")
library(dplyr)

a <- tibble(
  person = rep("a", 10),
  time = 1:10
) 

b <- tibble(
  person = rep("a", 3),
  time = c(1, 8, 5),
  pref = c("good", "terrible", "not so good")
)

left_join(a, b, by = join_by(person, closest(time >= time)))
#> # A tibble: 10 × 4
#>    person time.x time.y pref       
#>    <chr>   <int>  <dbl> <chr>      
#>  1 a           1      1 good       
#>  2 a           2      1 good       
#>  3 a           3      1 good       
#>  4 a           4      1 good       
#>  5 a           5      5 not so good
#>  6 a           6      5 not so good
#>  7 a           7      5 not so good
#>  8 a           8      8 terrible   
#>  9 a           9      8 terrible   
#> 10 a          10      8 terrible

However, the same approach does not (at least not yet) work with dbplyr.

library(dbplyr)
library(dplyr)

a <- memdb_frame(
  person = rep("a", 10),
  time = 1:10
) 

b <- memdb_frame(
  person = rep("a", 3),
  time = c(1, 8, 5),
  pref = c("good", "terrible", "not so good")
)

left_join(a, b, by = join_by(person, closest(time >= time)))
#> Error in `dplyr::common_by()`:
#> ! `by` must be a (named) character vector, list, or NULL for natural
#>   joins (not recommended in production code), not a <dplyr_join_by> object.`

How would you do this type of rolling join with database tables in R?

CodePudding user response：

One option is to make use of sql_on argument in the left_join

library(dbplyr)
library(dplyr)
left_join(a, b, sql_on =  "LHS.person = RHS.person and LHS.time >= RHS.time") %>% 
   select(-person.y) %>% 
   group_by(person.x, time.x) %>%
   slice_max(time.y) %>%
   ungroup

-output

# Source:   SQL [10 x 4]
# Database: sqlite 3.39.2 [:memory:]
   person.x time.x time.y pref       
   <chr>     <int>  <dbl> <chr>      
 1 a             1      1 good       
 2 a             2      1 good       
 3 a             3      1 good       
 4 a             4      1 good       
 5 a             5      5 not so good
 6 a             6      5 not so good
 7 a             7      5 not so good
 8 a             8      8 terrible   
 9 a             9      8 terrible   
10 a            10      8 terrible