Spark structured streaming from JDBC source-CodePudding

Can someone let me know if its possible to to Spark structured streaming from a JDBC source? E.g SQL DB or any RDBMS.

I have looked at a few similar questions on SO, e.g

Spark streaming jdbc read the stream as and when data comes - Data source jdbc does not support streamed reading

jdbc source and spark structured streaming

However, I would like to know if its officially supported on Apache Spark?

If there is any sample code that would be helpful.

Thanks

CodePudding user response：

No, there is no such built-in support in Spark Structured Streaming. The main reason is that most of databases doesn't provided an unified interface for obtaining the changes.

It's possible to get changes from some databases using archive logs, write-ahead logs, etc. But it's database-specific. For many databases the popular choice is Debezium that can read such logs and push list of changes into a Kafka, or something similar, from which it could be consumed by Spark.

CodePudding user response：

I am on a project now architecting this using CDC Shareplex from ORACLE and writing to KAFKA and then using Spark Structured Streaming with KAFKA integration and MERGE on delta format on HDFS.

Ie that is the way to do it if not using Debezium. You can use change logs for base tables or materialized views to feed CDC.

So direct JDBC is not possible.