I have an understanding that for a spark streaming merge it's helpful to have a checkpoint location specified to not process stuff twice on the job restart (even if the operation is idempotent and ins't mentioned in example notebook). Is it correct?
CodePudding user response:
If you don't specify the location of the checkpoint, each time all the data will be reprocessed.