Home > database >  Is the same using cache() and using persist() function with no parameteres in pyspark?
Is the same using cache() and using persist() function with no parameteres in pyspark?

Time:12-20

Is there any major difference in any term between persist() no parameters and cache()?

I know that if you use cache(), the parameteres of the storage level are set by default and in persist() you can edit these parameters.

CodePudding user response:

There is no difference, actually cache() is an alias for persist, looks how it looks in code:

Source code

/**
   * Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
   *
   * @group basic
   * @since 1.6.0
   */
  def cache(): this.type = persist()

And persist without parameters which is called from within cache is:

/**
   * Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
   *
   * @group basic
   * @since 1.6.0
   */
  def persist(): this.type = {
    sparkSession.sharedState.cacheManager.cacheQuery(this)
    this
  }
  • Related