Let's assume I have a streaming dataframe, and I'm writing it to Databricks Delta Lake:
someStreamingDf.writeStream
.format("delta")
.outputMode("append")
.start("targetPath")
and then creating a delta table out of it:
spark.sql("CREATE TABLE <TBL_NAME> USING DELTA LOCATION '<targetPath>'
TBLPROPERTIES ('delta.autoOptimize.optimizeWrite'=true)")
which fails with AnalysisException: The specified properties do not match the existing properties at <targetPath>
.
I know I can create a table beforehand:
CREATE TABLE <TBL_NAME> (
//columns
)
USING DELTA LOCATION "< targetPath >"
TBLPROPERTIES (
"delta.autoOptimize.optimizeWrite" = true,
....
)
and then just write to it, but writting this SQL with all the columns and their types looks like a bit of extra/unnecessary work. So is there a way to specify these TBLPROPERTIES while writing to a delta table (for the first time) and not beforehand?
CodePudding user response:
If you look into documentation, you can see that you can set following property:
spark.conf.set(
"spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite", "true")
and then all newly created tables will have delta.autoOptimize.optimizeWrite
set to true
.
another approach - create table without option, and then try to do alter table set tblprperties
(not tested although)