Is there any major difference in any term between persist()
no parameters and cache()
?
I know that if you use cache()
, the parameteres of the storage level are set by default and in persist()
you can edit these parameters.
CodePudding user response:
There is no difference, actually cache() is an alias for persist, looks how it looks in code:
/**
* Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
*
* @group basic
* @since 1.6.0
*/
def cache(): this.type = persist()
And persist without parameters which is called from within cache is:
/**
* Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
*
* @group basic
* @since 1.6.0
*/
def persist(): this.type = {
sparkSession.sharedState.cacheManager.cacheQuery(this)
this
}