Home > OS >  "The associated location already exists" when saving a Spark DataFrame with mode('ove
"The associated location already exists" when saving a Spark DataFrame with mode('ove

Time:11-16

With mode('overwrite') set during a saveAsTable() operation:


df1.write.format('parquet').mode('overwrite').saveAsTable(
    'spark_no_bucket_table1')

Then why does saving a table fail?

pyspark.sql.utils.AnalysisException: Can not create the managed 
      table('`spark_no_bucket_table1`'). 
The associated location('file:experiments/spark-warehouse/spark_no_bucket_table1') 
   already exists.

CodePudding user response:

From Spark's 2.4.0 migration guide:

Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set true to spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation restores the previous behavior. This option will be removed in Spark 3.0.

So if you use Spark in version >= 2.4.0 and < 3.0.0, you can solve it by setting:

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

For Spark version > 3.0.0, you will have to manually clean up the data directory specified in the error message.

  • Related