How can a single column .apply() be made faster in Python Pandas?-CodePudding

Learned how to run a profiler for a code that needs many iterations in hopes to make the run times for sustainable. Turns out this take up 55-58% of the run time:

data['CDA_Factor_Avg'] = data.apply(lambda row : data['CDA_Factor'].loc[ starting_date : row.name ].mean(), axis=1)

Resulting in a Pandas dataframe 'data', columns 'CDA_Factor_Avg' and 'CDA_Factor' like:

CDA_Factor	CDA_Factor_Avg
1	1
4	2.5
9	4.66

Where the mean is only ever taken up to the current cell. The Index is datetime. Does anyone see any better alternatives?

Thank you!

CodePudding user response：

You can use a expanding mean:

>>> df["CDA_Factor"].expanding().mean()
0    1.000000
1    2.500000
2    4.666667
Name: CDA_Factor, dtype: float64

CodePudding user response：

You might divide cumsum by number of elements following way

import pandas as pd
df = pd.DataFrame({"CDA_Factor":[1,4,9]})
df["CDA_Factor_Avg"] = df["CDA_Factor"].cumsum() / range(1,4)
print(df)

gives output

   CDA_Factor  CDA_Factor_Avg
0           1        1.000000
1           4        2.500000
2           9        4.666667