I am reading table in pyspark
df = spark.readStream.format("delta").load("mySourceTable")
And I write it using
df.writeStream.format("delta").outputMode("append").option("checkpointLocation", "/_checkpoints/myOutputTable").start("myOutputTable")
My question is how can I remove all the checkpoints so that pyspark reads mySourceTable
from the beginning, instead of from where it was last read?
Thank you.
I don't know how to remove the checkpoints in "/_checkpoints/myOutputTable").start("myOutputTable")
CodePudding user response:
I don't know how to remove the checkpoints in "/_checkpoints/myOutputTable").start(myOutputTable")
After stopping the Spark application, you can go directly to the checkpointLocation
directory on your file system (or wherever the table is stored e.g. S3) and move/delete it.
When you then restart the Spark application it will process mySourceTable
from the beginning.