Home > database >  Does spark use memorization when applying a transformation?
Does spark use memorization when applying a transformation?

Time:06-25

For example : assume we have the following dataset :

Student Grade
Bob 10
Sam 30
Tom 30
Vlad 30

when spark executes the following transformation :

df.withColumn("Grade_minus_average", df("Grade") - lit(average) ) 

will spark compute "30 - average" 3 times or will it reuse the computation ?

(let's assume there is only one partition)

CodePudding user response:

No.

An excellent source: https://www.linkedin.com/pulse/catalyst-tungsten-apache-sparks-speeding-engine-deepak-rajak/

Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime. This is not in any particular way specific to Catalyst. It is just a standard compilation technique and its benefits should be obvious. It is better to compute expression once than repeat this for each row.

As you have a constant and a variable, it will be done for each row and there is no process within Spark, that at run-time track if the previous invocation had the same input, nor a cache of previous values outcomes.

  • Related