how to bulk fetch data from Scylladb?-CodePudding

in our use case , we must fetch data from scylladb and put into Elasticsearch. if we take record one by one, it must take too much time.

i found scylladb no binlog,right?

so , do you have better suggestion?

CodePudding user response：

You might want to look at using Change Data Capture in Scylla, then using the CDC tables to feed a Kafka topic that will populate Elasticsearch.

ScyllaDB's CDC connector for Kafka is built on Debezium. You can read more about it here.

https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-connector-scylla-cdc-source-connector/

CodePudding user response：

And if you want read everything on top of live additions using CDC, you can just write a sample scala spark application, that will just load everything needing a fulltext search from Scylla to Elastic (sample apps are on the internet or have a look at series of blogs around Scylla migrator, which explain how to properly leverage dataframes).

Fwiw, Scylla supports operator LIKE, in case simple search will cut it for you (and assuming your partitions are not huge) instead of lucene query language and inverted indexes Elastic uses.

LINKS:

https://docs.scylladb.com/getting-started/dml/#like-operator

https://www.scylladb.com/2018/07/31/spark-scylla/

https://www.scylladb.com/2019/03/12/deep-dive-into-the-scylla-spark-migrator/

https://github.com/scylladb/scylla-code-samples/tree/master/spark3-scylla4-demo

https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html

not sure how useful will be this:

https://www.youtube.com/watch?v=9pfEVQ9te5E L