pyspark inserInto in overwrite mode is appending and not overwriting partitions-CodePudding

i'm a data engieneer im working on spark 2.3 , and i'm running into some problems :

the function inserInto like below is not insering in overwrite, but is appending even i changed the spark.conf to 'dynamic'


spark = spark_utils.getSparkInstance()
spark.conf.set('spark.sql.sources.partitionOverwriteMode', 'dynamic')

df\
.write\
.mode('overwrite')\
.format('orc')\
.option("compression","snappy")\
.insertInto("{0}.{1}".format(hive_database , src_table ))

each time i run the job, lines are appended in the partition and not overwrited any one passed through this probleme ? thank you

CodePudding user response：

I tried to reproduce the error, and from the documentation, you must overwrite to true in insertInto.

    def insertInto(self, tableName, overwrite=False):
        """Inserts the content of the :class:`DataFrame` to the specified table.

        It requires that the schema of the class:`DataFrame` is the same as the
        schema of the table.

        Optionally overwriting any existing data.
        """
        self._jwrite.mode("overwrite" if overwrite else "append").insertInto(tableName)

So applying this to your code will be:

df\
.write\
.mode('overwrite')\
.format('orc')\
.option("compression","snappy")\
.insertInto("{0}.{1}".format(hive_database , src_table ), overwrite=True))