Python Pandas aggregation error: "No matching signature found" when trying to calculate mo-CodePudding

I have a list of pandas dataframes, and I would like to perform a mode operation on all of them.

All dataframes have the same layout:

index | date | sentiment| ----- | ------ | -------- | 0 |2022-01-01| 1 | 1 |2022-02-03| -1 | 2 |2021-10-01| 0 | ...

with date being a dt.date object, and sentiment being an integer (-1, 0, or 1). I would like to get a dataframe group by dates, with the sentiment being the mode of the original (or a list of modes, if there are more).

I use this to aggregate:

df = df.groupby('date').agg(pd.Series.mode)

It works fine with almost all of my dataframes, only one of them returns an error:

  File "..\lib\site-packages\pandas\core\apply.py", line 420, in agg_list_like
    raise ValueError("no results")
ValueError: no results

and also, while trying to handle the error:

  File "..\site-packages\pandas\core\algorithms.py", line 1090, in mode
    npresult = htable.mode(values, dropna=dropna, mask=mask)
  File "pandas\_libs\hashtable_func_helper.pxi", line 2291, in pandas._libs.hashtable.__pyx_fused_cpdef
TypeError: No matching signature found

Which is I suppose where my error really occurs. I have no N/A values in either of the columns.

All my tables are in the below dtypes:

date         object
sentiment     int64
dtype: object

I tried dropping all NA values, which did practically nothing, tried parsing each column to a different datatype, hoping that it is really a type error, but had no success.

CodePudding user response：

The error you're getting only occurs if the length of the results is zero. And that occurs when There are only nans in the series you're taking the mode of, since .mode has dropna=True by default.

import pandas as pd

pd.Series([np.nan, None]).mode()

Series([], dtype: float64) # produces value error in agg function

If you run df = df.groupby('date').agg(lambda x: x.count) you should find dates where the count is zero in the dataframe causing the error. You can use .count as a filter to remove these dates before you run your agg.

Also, don't think you should be applying a class member function like that. I think

df = df.groupby('date').agg(lambda x:x.mode())

Is more correct, and

df.groupby('date').mode()

I would expect to work as well.