Q. why should we use cache since we have persist which has memory-only and other options?
this question was asked to me in an interview I don't have any idea about this please help me to understand.
CodePudding user response:
cache
is the same as persist
with the default storage level:
From the Scala code:
/**
* Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
*
* @group basic
* @since 1.6.0
*/
def cache(): this.type = persist()
So cache
can bee seen as a convenience function that is widely used.
CodePudding user response:
Also worth noting that the default storage level for RDDs is MEMORY_ONLY
, hence the behavior of cache()
in their case is, generally speaking, different from Dataset.