Does Spark Structured Streaming using Trigger.Once allow for a direct connection to KAFKA and use of MERGE statement? Or must the data for this be from a delta table?
This https://docs.databricks.com/_static/notebooks/merge-in-scd-type-2.html assumes tables as input. I cannot find an example with KAFKA being used with Trigger.Once. OK, the weekend is coming and I will fire up this and that, but it is an interesting point that I would like to know in advance.
CodePudding user response:
Yes, it's possible to use Trigger.Once
(or better newer Trigger.AvailableNow
) with Kafka, and then use foreachBatch
to execute MERGE.
The only thing that you need to take into account is that data shouldn't expire between executions.