I have a list of pandas dataframes, and I would like to perform a mode operation on all of them.
All dataframes have the same layout:
index | date | sentiment| ----- | ------ | -------- | 0 |2022-01-01| 1 | 1 |2022-02-03| -1 | 2 |2021-10-01| 0 | ...
with date being a dt.date object, and sentiment being an integer (-1, 0, or 1). I would like to get a dataframe group by dates, with the sentiment being the mode of the original (or a list of modes, if there are more).
I use this to aggregate:
df = df.groupby('date').agg(pd.Series.mode)
It works fine with almost all of my dataframes, only one of them returns an error:
File "..\lib\site-packages\pandas\core\apply.py", line 420, in agg_list_like
raise ValueError("no results")
ValueError: no results
and also, while trying to handle the error:
File "..\site-packages\pandas\core\algorithms.py", line 1090, in mode
npresult = htable.mode(values, dropna=dropna, mask=mask)
File "pandas\_libs\hashtable_func_helper.pxi", line 2291, in pandas._libs.hashtable.__pyx_fused_cpdef
TypeError: No matching signature found
Which is I suppose where my error really occurs. I have no N/A values in either of the columns.
All my tables are in the below dtypes:
date object
sentiment int64
dtype: object
I tried dropping all NA values, which did practically nothing, tried parsing each column to a different datatype, hoping that it is really a type error, but had no success.
CodePudding user response:
The error you're getting only occurs if the length of the results is zero. And that occurs when There are only nans in the series you're taking the mode of, since .mode
has dropna=True
by default.
import pandas as pd
pd.Series([np.nan, None]).mode()
Series([], dtype: float64) # produces value error in agg function
If you run df = df.groupby('date').agg(lambda x: x.count)
you should find dates where the count is zero in the dataframe causing the error. You can use .count
as a filter to remove these dates before you run your agg.
Also, don't think you should be applying a class member function like that. I think
df = df.groupby('date').agg(lambda x:x.mode())
Is more correct, and
df.groupby('date').mode()
I would expect to work as well.