I saw that pct_change
function is partially implemented with the missing of some parameters.
- Using Pyspark pandas Series:
data = pandas.Series([90, 91, 85], index=[2, 4, 1])
print(type(data))
print(data.pct_change())
UPDATE:
The error occurs because, using
DataFrame.toPandas
is different fromDataFrame.toPandas()
.In this case, when you use
data.toPandas
it returns an object of typemethod
. When you try to usepct_change()
on this object, it is giving error.
- Using
DataFrame.toPandas()
would return a DataFrame object on which you can usepct_change()
. So modify the code as following to achieve the requirement.
data_pd = data.toPandas()
print(type(data_pd))
op = data_pd.pct_change()
print(op)
CodePudding user response:
After having a chat with @SaideepArik, we find that pandas_api()
can solve the problem.
#Covert Spark Dataframe to Spark Pandas Dataframe
data_pd = data.pandas_api()
data_pd.pct_change()