Learned how to run a profiler for a code that needs many iterations in hopes to make the run times for sustainable. Turns out this take up 55-58% of the run time:
data['CDA_Factor_Avg'] = data.apply(lambda row : data['CDA_Factor'].loc[ starting_date : row.name ].mean(), axis=1)
Resulting in a Pandas dataframe 'data', columns 'CDA_Factor_Avg' and 'CDA_Factor' like:
CDA_Factor | CDA_Factor_Avg |
---|---|
1 | 1 |
4 | 2.5 |
9 | 4.66 |
Where the mean is only ever taken up to the current cell. The Index is datetime. Does anyone see any better alternatives?
Thank you!
CodePudding user response:
You can use a expanding
mean:
>>> df["CDA_Factor"].expanding().mean()
0 1.000000
1 2.500000
2 4.666667
Name: CDA_Factor, dtype: float64
CodePudding user response:
You might divide cumsum by number of elements following way
import pandas as pd
df = pd.DataFrame({"CDA_Factor":[1,4,9]})
df["CDA_Factor_Avg"] = df["CDA_Factor"].cumsum() / range(1,4)
print(df)
gives output
CDA_Factor CDA_Factor_Avg
0 1 1.000000
1 4 2.500000
2 9 4.666667