Home > Software engineering >  How can I reset checkpoints in pyspark?
How can I reset checkpoints in pyspark?

Time:01-29

I am reading table in pyspark

df = spark.readStream.format("delta").load("mySourceTable")  

And I write it using

df.writeStream.format("delta").outputMode("append").option("checkpointLocation", "/_checkpoints/myOutputTable").start("myOutputTable")

My question is how can I remove all the checkpoints so that pyspark reads mySourceTable from the beginning, instead of from where it was last read?

Thank you.

I don't know how to remove the checkpoints in "/_checkpoints/myOutputTable").start("myOutputTable")

CodePudding user response:

I don't know how to remove the checkpoints in "/_checkpoints/myOutputTable").start(myOutputTable")

After stopping the Spark application, you can go directly to the checkpointLocation directory on your file system (or wherever the table is stored e.g. S3) and move/delete it.

When you then restart the Spark application it will process mySourceTable from the beginning.

  • Related