Home > Back-end >  How to specify delta table properties when writing a steaming spark dataframe
How to specify delta table properties when writing a steaming spark dataframe

Time:12-01

Let's assume I have a streaming dataframe, and I'm writing it to Databricks Delta Lake:

someStreamingDf.writeStream
  .format("delta")
  .outputMode("append")
  .start("targetPath")

and then creating a delta table out of it:

spark.sql("CREATE TABLE <TBL_NAME> USING DELTA LOCATION '<targetPath>'
TBLPROPERTIES ('delta.autoOptimize.optimizeWrite'=true)")

which fails with AnalysisException: The specified properties do not match the existing properties at <targetPath>.

I know I can create a table beforehand:

CREATE TABLE <TBL_NAME> (
  //columns
) 
USING DELTA LOCATION "< targetPath >"
TBLPROPERTIES (
  "delta.autoOptimize.optimizeWrite" = true, 
  ....
) 

and then just write to it, but writting this SQL with all the columns and their types looks like a bit of extra/unnecessary work. So is there a way to specify these TBLPROPERTIES while writing to a delta table (for the first time) and not beforehand?

CodePudding user response:

If you look into documentation, you can see that you can set following property:

spark.conf.set(
  "spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite", "true")

and then all newly created tables will have delta.autoOptimize.optimizeWrite set to true.

another approach - create table without option, and then try to do alter table set tblprperties (not tested although)

  • Related