Why we should use cache since we have persist in spark-CodePudding

Q. why should we use cache since we have persist which has memory-only and other options?

this question was asked to me in an interview I don't have any idea about this please help me to understand.

CodePudding user response：

cache is the same as persist with the default storage level:

/**
 * Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
 *
 * @group basic
 * @since 1.6.0
 */
def cache(): this.type = persist()

So cache can bee seen as a convenience function that is widely used.

CodePudding user response：

Also worth noting that the default storage level for RDDs is MEMORY_ONLY, hence the behavior of cache() in their case is, generally speaking, different from Dataset.