Home > OS >  Write spark Dataframe to an exisitng Delta Table by providing TABLE NAME instead of TABLE PATH
Write spark Dataframe to an exisitng Delta Table by providing TABLE NAME instead of TABLE PATH

Time:11-29

I am trying to write spark dataframe into an existing delta table.

I do have multiple scenarios where I could save data into different tables as shown below.

SCENARIO-01:

I have an existing delta table and I have to write dataframe into that table with option mergeSchema since the schema may change for each load.

I am doing the same with below command by providing delta table path

finalDF01.write.format("delta").option("mergeSchema", "true").mode("append") \
  .partitionBy("part01","part02").save(finalDF01DestFolderPath)

Just want to know whether this can be done by providing exisiting delta TABLE NAME instead of delta PATH.

This has been resolved by updating data write command as below.

finalDF01.write.format("delta").option("mergeSchema", "true").mode("append") \
  .partitionBy("part01","part02").saveAsTable(finalDF01DestTableName)
  1. Is this the correct way ?

SCENARIO 02:

I have to update the existing table if the record already exists and if not insert a new record. For this I am currently doing as shown below.

spark.sql("SET spark.databricks.delta.schema.autoMerge.enabled = true") 
DeltaTable.forPath(DestFolderPath)
   .as("t")
   .merge(
      finalDataFrame.as("s"),
         "t.id = s.id AND t.name= s.name")
       .whenMatched().updateAll()
       .whenNotMatched().insertAll()
   .execute()

I tried with below script.

destMasterTable.as("t")
      .merge(
         vehMasterDf.as("s"),
          "t.id = s.id")
       .whenNotMatched().insertAll()
       .execute()

but getting below error(even with alias instead of as).

error: value as is not a member of String
    destMasterTable.as("t")
  1. Here also I am using delta table path as destination, Is there any way so that we could provide delta TABLE NAME instead of TABLE PATH?

It will be good to provide TABLE NAME instead of TABLE PATH, In case if we chage the table path later will not affect the code. I have not seen anywhere in databricks documentation providing table name along with mergeSchema and autoMerge. Is it possible to do so?

CodePudding user response:

To use existing data as a table instead of path you either were need to use saveAsTable from the beginning, or just register existing data in the Hive metastore using the SQL command CREATE TABLE USING, like this (syntax could be slightly different depending on if you're running on Databricks, or OSS Spark, and depending on the version of Spark):

CREATE TABLE IF NOT EXISTS my_table
USING delta
LOCATION 'path_to_existing_data'

after that, you can use saveAsTable.

For the second question - it looks like destMasterTable is just a String. To refer to existing table, you need to use function forName from the DeltaTable object (doc):

DeltaTable.forName(destMasterTable)
  .as("t")
  ...
  • Related