Home > Mobile >  Why we should use cache since we have persist in spark
Why we should use cache since we have persist in spark

Time:10-09

Q. why should we use cache since we have persist which has memory-only and other options?

this question was asked to me in an interview I don't have any idea about this please help me to understand.

CodePudding user response:

cache is the same as persist with the default storage level:

From the Scala code:

/**
 * Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
 *
 * @group basic
 * @since 1.6.0
 */
def cache(): this.type = persist()

So cache can bee seen as a convenience function that is widely used.

CodePudding user response:

Also worth noting that the default storage level for RDDs is MEMORY_ONLY, hence the behavior of cache() in their case is, generally speaking, different from Dataset.

  • Related