Home > Enterprise >  How to know what happens under the hood of Apache Spark (from the code)?
How to know what happens under the hood of Apache Spark (from the code)?

Time:10-21

I want to understand spark by reading code files from apache spark github link.

https://github.com/apache/spark

I have some experience in Scala, but most of my experience has been in PySpark. And I undertand Spark architecture and various optimization techniques too, but I am curious how they are implemented internally. For example what happens when I call repartition() method. Could anyone from community guide me in how I should go about it.

CodePudding user response:

Use IntelliJ IDEA to open the sources of Apache Spark. You can open the sources as a maven or sbt project (just pick the proper build configuration).

Once the above's done, Cmd Option o to find a symbol of your interest, e.g. repartition(). Use Cmd b to drill down until you're at the very bottom (of the call chain) and go up (to take a breath...not break!) Rinse and repeat.

  • Related