Home > Software engineering >  A schema mismatch detected when writing to the Delta table
A schema mismatch detected when writing to the Delta table

Time:11-04

I receive:

A schema mismatch detected when writing to the Delta table

I tried to follow the suggestion:

To overwrite your schema or change partitioning, please set: '.option("overwriteSchema", "true")'.

Based on this solution: A schema mismatch detected when writing to the Delta table - Azure Databricks

  • I added such an option, but it looks like it doesn't work or I simply did it improperly.
  • I cleaned dbfs:/delta-tables/_delta_log/
  • I tried even to remove the whole folder where I saved data (dbfs:/FileStore/shared_upload/[user]/data_Delta).

None of this has fixed the issue. What I'm doing wrong? And why it behaves in such a way if I want to reuse an old notebook with a new cluster, the old one I terminated, so it should be 'clean' as I guess?

I proceed in the following way:

1.I load data from GEN2. They are in parquet format this way:

spark.read.option("overwriteSchema", "true")\
  .parquet(f"wasbs://{CONTAINER_NAME}@{STORAGE_ACCOUNT_NAME}.blob.core.windows.net/data")

As you see I set overwriteSchema as true.

2.Then I save it in delta format:

sd_weather.write.format('delta').mode("overwrite") \
  .save("dbfs:/FileStore/shared_upload/[user]/data_Delta")

3.Then I try to create Delta Table

sd_weather.write.format('delta') \
  .mode("overwrite").saveAsTable("data_Delta")

And here I receive the error:

AnalysisException: A schema mismatch detected when writing to the Delta table

CodePudding user response:

You need to use .option("overwriteSchema", "true") in the write operation, not in the read one:

sd_weather.write.format('delta').mode("overwrite") \
  .option("overwriteSchema", "true") \
  .save("dbfs:/FileStore/shared_upload/[user]/data_Delta")

You also writing your data twice, once as "normal" directory, second - as a managed table. If you want to create unmanaged table in custom location, just add the path option to the 3rd variant (also, dbfs:/ is default schema, so you may omit it):

sd_weather.write.format('delta') \
  .option("overwriteSchema", "true") \
  .option("path", "/FileStore/shared_upload/[user]/data_Delta") \
  .mode("overwrite").saveAsTable("data_Delta")

Also, it depends on how different is the schema, if it just adds the columns or something like, then you can use mergeSchema instead of overwriteSchema.

  • Related