I have a local SQLite database with two tables. I want to make a query from one table filtering rows by ids from another table in the same database using dbplyr.
As an example, here is what I would do without a database involved:
# install.packages("tidyverse")
library(tidyverse)
data <-
tibble(id = seq(1, 8),
data = LETTERS[seq(1, 8)])
condition <-
tibble(id = seq(1, 8),
group = c(rep("low", 4), rep("high", 4)))
data %>%
filter(id %in% pull(condition %>%
filter(group == "high") %>%
select(id)))
which gives me
# A tibble: 4 × 2
id data
<int> <chr>
1 5 E
2 6 F
3 7 G
4 8 H
Using the same code with tables in the database instead of tibbles in the memory
# install.packages("RSQLite")
library(RSQLite)
test_db <-
dbConnect(RSQLite::SQLite(), "test_db.sqlite")
dbWriteTable(test_db, "data_tbl", data)
dbWriteTable(test_db, "condition_tbl", condition)
dbListTables(test_db)
data_db <-
tbl(test_db, "data_tbl")
condition_db <-
tbl(test_db, "condition_tbl")
data_db %>%
filter(id %in%
pull(condition_db %>%
filter(group == "high") %>%
select(id)))
produces the following error:
Error in UseMethod("escape") :
not applicable method for 'escape' applied on object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
When I load the ids in the memory first the query works without a problem.
detour <-
pull(condition_db %>%
filter(group == "high") %>%
select(id) %>%
collect())
data_db %>%
filter(id %in% detour)
# A tibble: 4 × 2
id data
<int> <chr>
1 5 E
2 6 F
3 7 G
4 8 H
Since pull()
seems to be the problem here - how can I make such queries without the "detour" provided above?
CodePudding user response:
I use a semi-join for this problem when working with dbplyr. A semi-join between two tables returns every record from the first table where there is at least one match with a record in the second table. (An anti-join is similar, returning where there are no matches in the second table.)
This would look like:
prepared_condition = condition %>%
filter(group == "high")
output = data %>%
semi_join(prepared_condition, by = "id")