Neo4j and Apache Spark-CodePudding

I have a complex analytical Neo4j Cypher query which I run each time in run-time. According to the following documentation, https://neo4j.com/developer/apache-spark/ looks like I may execute the query on Apache Spark cluster:

org.neo4j.spark.Neo4j(sc).cypher("MATCH (n:Person) RETURN n.name").partitions(5).batch(10000).loadRowRdd

Does this mean that instead of doing such a simple query, I can do a Cypher query of any complexity this way to take advantage of Spark's in-memory parallel processing?

CodePudding user response：

The documentation for the new Neo4j Spark Connector states:

partitions: This defines the parallelization level while pulling data from Neo4j. Note: as more parallelization does not mean better query performance, tune wisely in according to your Neo4j installation.

You can definitely try it out, but it isn't a given that the performance will be better. Read more in the docs: https://neo4j.com/docs/spark/current/reading/