Home > Software design >  Does Pyspark Pandas support Pandas pct_change function?
Does Pyspark Pandas support Pandas pct_change function?

Time:07-23

I saw that pct_change function is partially implemented with the missing of some parameters.

enter image description here

  • Using Pyspark pandas Series:
data = pandas.Series([90, 91, 85], index=[2, 4, 1])
print(type(data))
print(data.pct_change())

enter image description here

UPDATE:

  • The error occurs because, using DataFrame.toPandas is different from DataFrame.toPandas().

  • In this case, when you use data.toPandas it returns an object of type method. When you try to use pct_change() on this object, it is giving error.

enter image description here

  • Using DataFrame.toPandas() would return a DataFrame object on which you can use pct_change(). So modify the code as following to achieve the requirement.
data_pd = data.toPandas()
print(type(data_pd))

op = data_pd.pct_change()
print(op)

enter image description here

CodePudding user response:

After having a chat with @SaideepArik, we find that pandas_api() can solve the problem.

    #Covert Spark Dataframe to Spark Pandas Dataframe 
    data_pd = data.pandas_api()

    data_pd.pct_change()
  • Related