I am trying to generate a dataframe by using toDF function like this
When I see the Spark UI , after running the df.show action , I don't see any DAG , why is this happening?
CodePudding user response:
Because it is in memory with no parallelization called; there is a Spark optimization that can do it immediately where Seq
is used to create a dataframe
.
The same via this:
val df = sc.parallelize(1 until 5).toDF("a")
does produce Job / DAG as workers, distribution is involved.