Home > Enterprise >  Pyspark Structured Streaming continuous vs processingTime triggers
Pyspark Structured Streaming continuous vs processingTime triggers

Time:08-18

I've been looking into using triggers for a streaming job, but the differences between Continuous trigger vs processingTime trigger are not clear to me.

As far as I've read on different sites:

  1. continuous is just an attempt to make the streaming almost real-time instead of micro-batch based (using much lower latency of 1ms).
  2. As of the time of writing this question, only supports a couple of sources and sinks like Kafka.

Are these two points the only differences between the two triggers?

CodePudding user response:

You are pretty much right Structured Streaming continuous got added in order to respond to low latency needs by achieving near-real-time processing using a continuous query, unlike the old batch way where the latency is depending on processing time and the batch job duration (aka micro-batch query)

the docs are pretty useful to get more in-depth.

  • Related