I want to rename a redshift table within a Python Databricks notebook.
Currently I have a query that pulls in data and creates a table:
redshiftUrl = 'jdbc:redshift://myredshifturl'
redshiftOptions = {
'url': redshiftUrl,
'aws_iam_role': 'myredshift_iam_role',
'tempdir': 's3bucket',
}
def run_query(query_string):
# returns a df
return spark.read.format('com.databricks.spark.redshift').options(**redshiftOptions).option('query', query_string)
query = """
SELECT TOP 100 * FROM public.mytable
"""
df = run_query(query).load()
def write_table(src_table, table_name=None, write_mode='overwrite', diststyle=None, distkey=None, sortkeyspec=None, preactions=None, postactions=None):
srcDf = src_table
loadOptions = redshiftOptions
loadOptions['tempformat'] = 'CSV GZIP'
loadOptions['extracopyoptions'] = 'truncatecolumns'
if diststyle:
loadOptions['diststyle'] = diststyle
if distkey:
loadOptions['diststyle'] = 'key'
loadOptions['distkey'] = distkey
if sortkeyspec:
loadOptions['sortkeyspec'] = sortkeyspec
if preactions:
loadOptions['preactions'] = preactions
if postactions:
loadOptions['postactions'] = postactions
if table_name == None: table_name = src_table
srcDf.write.format('com.databricks.spark.redshift').options(**loadOptions).option('dbtable', table_name).mode(write_mode).save()
write_table(df, table_name = 'public.test')
I want to take this table I created and rename it. I referenced this doc but find it hard to follow.
I want to run this SQL command alter table public.test rename to test_table_to_be_dropped
in my pyspark databricks notebook. (this command works within my SQL IDE)
Here's what I have tried:
test_query = """
alter table public.test rename to test_table_to_be_dropped
"""
run_query(test_query).load()
This gives me an error:
java.sql.SQLException: [Amazon](500310) Invalid operation: syntax error at or near "table"
It leads me to believe that my run_query()
function is only for reading data and not for editing data. I also saw that I can use the %sql
magic command, but it looks like it will only reference my data lake and not my redshift database.
TL;DR:
Is there a way I can write a function to use sql write commands or to point my %sql
magic command to reference my redshift table?
I want to run this SQL command alter table public.test rename to test_table_to_be_dropped
in my pyspark databricks notebook. (this command works within my SQL IDE)
CodePudding user response:
AWS Redshift has a postactions
and preactions
parameter that allows you to write SQL queries after your write query.
def write_table(src_table, table_name=None, diststyle=None, distkey=None, sortkeyspec=None, preactions=None, postactions=None):
srcDf = src_table
loadOptions = redshiftOptions
loadOptions['tempformat'] = 'CSV GZIP'
loadOptions['extracopyoptions'] = 'truncatecolumns compupdate off'
if diststyle:
loadOptions['diststyle'] = diststyle
if distkey:
loadOptions['diststyle'] = 'key'
loadOptions['distkey'] = distkey
if sortkeyspec:
loadOptions['sortkeyspec'] = sortkeyspec
if preactions:
loadOptions['preactions'] = preactions
if postactions:
loadOptions['postactions'] = postactions
if table_name == None: table_name = src_table
srcDf.write.format('com.databricks.spark.redshift').options(**loadOptions).option('dbtable', table_name).mode('overwrite').save()
sql_query = """
ALTER TABLE some sql actions;
DROP TABLE mytable
"""
write_table(df,table_name=mytable,postactions=sql_query)