Home > Enterprise >  Store new permanent table in schema using compute
Store new permanent table in schema using compute

Time:03-10

I want to use dbplyr syntax to do some JOIN / FILTER operations on some tables and store the results back to the Database without collecting it first.

From what I read compute(..., temporary = FALSE, ...) should be doing that, however I struggle of how to provide the fully qualified name (that is database.schema.table_name) to the table where I want to store

I know about DBI::Id and dbplyr::in_schema but I do not know how use them properly. A try with sql did at least what I wanted (created the table) but resulted in a (spurios?) error.

What do I need to do?

Some NoReprex

library(DBI)
library(dbplyr)

con <- dbConnect(odbc::odbc(), "myserver")

## do __not__ collect the data
my_frame <- con %>%
   tbl(Id(catalog = "mydb", schema = "dbo", table = "mytable")) %>%
   inner_join(con %>% tbl(Id(catalog = "mydb", schema = "dbo", 
                             table = "yetanothertable")),
              "id")

compute(my_frame,
        # Id(catalog = "mydb", schema = "dbo", table = "mynewtable"), # (1)
        # in_schema("dbo", "mynewtable"),                             # (2),
        sql("mydb.dbo.mynewtable"),                                   # (3)
        FALSE)

I get different error depending on th variant I am using

# (1)
## Error in h(simpleError(msg, call)) : 
##   error in evaluating the argument 'conn' in selecting a method for function 
## 'dbQuoteIdentifier': argument "con" is missing, with no default

# (2)
## Error in escape(x$schema, con = con) : 
##   argument "con" is missing, with no default

# (3)
## Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.  
##            [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. 
## <SQL> 'SELECT *
## FROM (my.fully_qualified.name) "q02"
## WHERE (0 = 1)'

P.S.: I really want to be able to save the table with the fully qualified name, that is including the database name (though it is the same in this simplified example). So a dbConnect(..., database = <somedb>) won't solve my problem in the long run.

P.P.S: I am looking for a compute solution. I know I could construct the SQL myself, but I am really interested to see whether I can use the dbplyr abstraction layer for that.

CodePudding user response:

I have previously preferred the solution that involves writing some SQL (As per this answer). But as you exclude this approach in your question, I tested and found a way to do it without writing SQL.

We are going to use db_compute instead of compute.

  • The documentation for compute states that "compute() stores results in a remote temporary table". So I took this to mean we can not write permanent tables using compute.
  • The documentation for db_compute says very little. But it appears alongside db_copy_to which has a purpose similar to what we are looking for. So it was worth trying (and it works).

General setup

library(DBI)
library(dplyr)
library(dbplyr)

# connect to database
connection_string = "..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)

# remote table
remote_table = tbl(db_connection, from = in_schema("schema","table"))
top_rows = remote_table %>%
  head()

Testing compute

top_rows %>% show_query()
# <SQL>
# SELECT TOP (6) *
# FROM [database_name].[schema_name].[table_name]

top_rows = top_rows %>%
  compute()
# Created a temporary table named: #dbplyr_002

top_rows %>% show_query()
# <SQL>
# SELECT *
# FROM #dbplyr_.002

So we can see compute writes a temporary table. Hence if we did some complex processing (instead of just taking the top few rows) compute would be an effective way to storing the processed table so we can avoid repeating the complex processing every time we query it.

But because it is temporary, the table should disappear when we disconnect from the database: DBI::dbDisconnect(db_connection).

Testing db_compute

out = db_compute(
  con = db_connection,
  table = in_schema("schema","new_table"),
  sql = sql_render(top_rows),
  temporary = FALSE
)

out
# <IDENT> database_name.schema_name.new_table

# reconnect
new_remote_table = tbl(db_connection, from = in_schema("schema","new_table"))

So we can now access the new (permanent) table from within R. I also checked and confirmed that the table exists in the database with an SQL query.

Note that as db_compute has minimal documentation, it is not clear whether it is intended to be used in this way. I have tested the above and it worked. But without additional documentation, use at your own risk.

CodePudding user response:

Just for the record, thanks to Simon's answer I finally got the correct way of providing the fully qualified table name (i.e. containing database, schema and table name). With this piece of information, one can rely on compute and everything works like a charm. You just have to provide the database name as part of the schema and make sure to escape if via sql:

compute(my_frame,
        # in_schema("mydb.dbo", "mynewtable") would __not__ work
        in_schema(sql("mydb.dbo"), "mynewtable"), 
        FALSE)
  • Related