Overview
I am getting a ValueError when trying to apply a simple function over a dataframe with axis=1
(details below). It looks like it is trying to unpack the output into the columns of the dataframe instead of rows. The problem seems to be related to the apply()
specifically, and only occurs when axis=1
is used. Why is this error occurring?
Example
Here is a simple example to reproduce the error (obviously in my use case the function I actually want to apply does not exist as a pandas built in):
import pandas as pd
import numpy as np
# data and dummy function
df = df = pd.DataFrame(2 * np.arange(10).reshape(2,5) - 1, columns=list('abcde'))
def my_min(s):
"""
expects a series as input, outputs the min value
"""
return s.min()
# when try to apply rolling across the rows it throws error
df.rolling(window=3, min_periods=3, axis=1).apply(my_min)
The relevant part of the traceback is:
Expected output
It works when using the built in min, which is why I guess the problem is related to the apply function itself:
df.rolling(window=3, min_periods=2, axis=1).min()
Gives the expected output:
What I have tried
- Checking docs at here, there doesn't seem to be any useful hints. Just that the apply function should expect a series (when Raw=False, which is default behaviour) and return a scalar.
- I also note that when I first transpose the dataframe and run on
axis=0
, it works fine. So an easy workaround isdf.T.rolling(window=3, min_periods=2, axis=0).apply(my_min).T
. But it does not answer my question as to why the behaviour is different when rolling acrossaxis=1
. - I have noted a related question here, but as far as I can tell it does not answer mine.
Thanks!
CodePudding user response:
This doesn't produce an error with the latest pandas (1.4.4):
pd.__version__
1.4.4
df.rolling(window=3, min_periods=3, axis=1).apply(my_min)
a b c d e
0 NaN NaN -1.0 1.0 3.0
1 NaN NaN 9.0 11.0 13.0
Versions older than 1.4.1 are impacted (issue #45912). A workaround is to use raw=True
:
df.rolling(window=3, min_periods=3, axis=1).apply(my_min, raw=True)