I have a complex analytical Neo4j Cypher query which I run each time in run-time. According to the following documentation, https://neo4j.com/developer/apache-spark/ looks like I may execute the query on Apache Spark cluster:
org.neo4j.spark.Neo4j(sc).cypher("MATCH (n:Person) RETURN n.name").partitions(5).batch(10000).loadRowRdd
Does this mean that instead of doing such a simple query, I can do a Cypher query of any complexity this way to take advantage of Spark's in-memory parallel processing?
CodePudding user response:
The documentation for the new Neo4j Spark Connector states:
partitions: This defines the parallelization level while pulling data from Neo4j. Note: as more parallelization does not mean better query performance, tune wisely in according to your Neo4j installation.
You can definitely try it out, but it isn't a given that the performance will be better. Read more in the docs: https://neo4j.com/docs/spark/current/reading/