Home > Blockchain >  SQL query in R (glue library)
SQL query in R (glue library)

Time:04-06

Toy table: (code to create it in R is given below)

# Source:   table<table> [?? x 1]
# Database: sqlite 3.36.0 [:memory:]
  text                                       
  <chr>                                      
1 blablabla word1 blablabla word2 blablabla
2 blablabla word2 blablabla word1 blablabla
3 blablabla word1 blablabla 

QUESTION: Can you help me to write an SQL query using glue_sql() that keeps the rows where text contains both "word1" and "word2", regardless of the order (i.e. rows 1 and 2)?

Something along these lines:

keys <- c("word1", "word2")
q <- glue::glue_sql("SELECT * FROM table WHERE ...", vals = keys, .con = con)

Code to create the table in R:

library(RPostgreSQL)
library(tidyverse)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

table <- data.frame(
  text = c("blablabla word1 blablabla word2 blablabla",
           "blablabla word2 blablabla word1 blablabla",
           "blablabla word1 blablabla"))

dbWriteTable(con, "table", table)

tbl(con, "table")

CodePudding user response:

The simplest approach may be to target %-wildcards, and deal with the ordering brute-force:

dbGetQuery(sqlite, "select * from \"table\" where (text like '%word1%' and text like '%word2%')")
#                                        text
# 1 blablabla word1 blablabla word2 blablabla
# 2 blablabla word2 blablabla word1 blablabla

To form this programmatically,

subqry <- paste(
  "(",
  paste(
    paste0("text like '%", keys, "%'"),
    collapse = " and "),
  ")"
)
subqry
# [1] "( text like '%word1%' and text like '%word2%' )"
dbGetQuery(sqlite, paste("select * from \"table\" where", subqry))
#                                        text
# 1 blablabla word1 blablabla word2 blablabla
# 2 blablabla word2 blablabla word1 blablabla

A good practice for SQL is to use bound parameters, ala parameterized queries. A minor change:

subqry <- paste(
  "(",
  paste(
    rep("text like ?", length(keys)),
    collapse = " and "),
  ")"
)
subqry
# [1] "( text like ? and text like ? )"
dbGetQuery(sqlite, paste("select * from \"table\" where", subqry),
           params = as.list(paste0("%", keys, "%")))
#                                        text
# 1 blablabla word1 blablabla word2 blablabla
# 2 blablabla word2 blablabla word1 blablabla

You may wonder what this buys you ... aside from inadvertent SQL injection and potential mishandling of quotes in strings, it also allows DBMSes to optimize the query and reuse it. For example, select * from table where text like '%word1%' may be optimized/cached, but the very-similar query select * from table where text like '%word2%' is different, so the cached optimization is a miss. However, select * from table where text like ? would be a cache hit on the second query. Granted, with this query there is very little gain with this, but it's still a good habit to get into in general, and will pay dividends in the long run. (And it is a "best practice" listed in https://db.rstudio.com/best-practices/run-queries-safely/.)

  • Related