I receive:
A schema mismatch detected when writing to the Delta table
I tried to follow the suggestion:
To overwrite your schema or change partitioning, please set: '.option("overwriteSchema", "true")'.
Based on this solution: A schema mismatch detected when writing to the Delta table - Azure Databricks
- I added such an option, but it looks like it doesn't work or I simply did it improperly.
- I cleaned
dbfs:/delta-tables/_delta_log/
- I tried even to remove the whole folder where I saved data
(
dbfs:/FileStore/shared_upload/[user]/data_Delta
).
None of this has fixed the issue. What I'm doing wrong? And why it behaves in such a way if I want to reuse an old notebook with a new cluster, the old one I terminated, so it should be 'clean' as I guess?
I proceed in the following way:
1.I load data from GEN2. They are in parquet format this way:
spark.read.option("overwriteSchema", "true")\
.parquet(f"wasbs://{CONTAINER_NAME}@{STORAGE_ACCOUNT_NAME}.blob.core.windows.net/data")
As you see I set overwriteSchema
as true
.
2.Then I save it in delta format:
sd_weather.write.format('delta').mode("overwrite") \
.save("dbfs:/FileStore/shared_upload/[user]/data_Delta")
3.Then I try to create Delta Table
sd_weather.write.format('delta') \
.mode("overwrite").saveAsTable("data_Delta")
And here I receive the error:
AnalysisException: A schema mismatch detected when writing to the Delta table
CodePudding user response:
You need to use .option("overwriteSchema", "true")
in the write operation, not in the read one:
sd_weather.write.format('delta').mode("overwrite") \
.option("overwriteSchema", "true") \
.save("dbfs:/FileStore/shared_upload/[user]/data_Delta")
You also writing your data twice, once as "normal" directory, second - as a managed table. If you want to create unmanaged table in custom location, just add the path
option to the 3rd variant (also, dbfs:/
is default schema, so you may omit it):
sd_weather.write.format('delta') \
.option("overwriteSchema", "true") \
.option("path", "/FileStore/shared_upload/[user]/data_Delta") \
.mode("overwrite").saveAsTable("data_Delta")
Also, it depends on how different is the schema, if it just adds the columns or something like, then you can use mergeSchema
instead of overwriteSchema
.