Home > Net >  spark.write.synapsesql options with Azure Synapse Spark Pool
spark.write.synapsesql options with Azure Synapse Spark Pool

Time:04-12

Within Azure Synapse, I am using the synapsesql function with the Scala language within a Spark Pool notebook to push the contents of a data frame into the SQL Pool

// Write data frame to sql table
df2.write.
option(Constants.SERVER,s"${pServerName}.sql.azuresynapse.net").
synapsesql(s"${pDatabaseName}.xtr.${pTableName}",Constants.INTERNAL)

This works successfully, however I would like to add some extra functionality

  1. How do I specify the index type to be HEAP instead of Clustered Column store Index. In Data Bricks it can be done through .option("tableOptions","heap,distribution=ROUND-ROBIN") however this doesn't work in the Spark Pool notebook.

  2. How do I overwrite the table in the SQL Pool if it already exists? In Data Bricks you have SaveAsTable but I can't find anything similar in the Spark pool notebook.

CodePudding user response:

Anyways, as per this official document,

When a table is created, by default the data structure has no indexes and is called a heap.

To use the HEAP type index, while creating the table simply specify HEAP in the WITH clause:

CREATE TABLE myTable
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH ( HEAP );

To overwrite, import org.apache.spark.sql.SaveMode Class in your SQL Notebook and use Overwrite mode.

Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.

Refer DataFrame write SaveMode support to know more.

Also check this given example for your reference.

  • Related