I have a database with two columns: name (str) and probability (float).
I am running this command:
df[['name','probability']].groupby('name').prod()
on a Databricks (runtime 7.3) notebook and df is a pyspark.pandas dataframe.
The error I get is:
PandasNotImplementedError: The method `pd.groupby.GroupBy.prod()` is not implemented yet.
I wonder if there is a workaround.
CodePudding user response:
In this case, I think your mistake is simply the fact that you have not the latest versions of pandas installed. From what I can see V.1.5.2, has such function in its documentation, and when I tried to run such group by on a sample data I succeeded. Try running this command on your shell that oughta upgrade your pandas version and you are gonna able to run such function.
pip install --upgrade pandas
CodePudding user response:
check with this type(df[['name','probability']].groupby('name'))
is the type is pandas.core.groupby.generic.DataFrameGroupBy
or else you want to update version