a = df.groupby("RaceID")["wS"].transform(lambda x: x.expanding().mean().shift())
b = df.groupby("RaceID")["wS"].expanding().mean().shift().sort_index(level=1).droplevel(0)
I get the correct result if I run the first line. The second approach on the other hand is faster. It works fine as well, if I don't use shift this moves all the values in the rolling average of each group one step ahead.
RaceID transform notransform noshift
7140 1021458 0.215909 0.215909 0.191919
7141 1021459 NaN 0.191919 2.375000
7142 1021459 2.375000 2.375000 1.187500
7143 1021459 1.187500 1.187500 0.791667
7144 1021459 0.791667 0.791667 0.593750
7145 1021459 0.593750 0.593750 0.475000
7146 1021459 0.475000 0.475000 0.395833
7147 1021459 0.395833 0.395833 0.339286
7148 1021459 0.339286 0.339286 0.296875
7149 1021460 NaN 0.296875 10.000000
The column transform is the result of the first line and no transform is the result of the second line.
As you can see in line with index 7141 the transform correctly sets the first value to NaN when shifting the group. The operation without transform actually shifts the elements correctly, however it sets the first value to the last value of the previous group. This behavior is visible in line with index 1021460.
Data-example:
RaceID wS
7130 1017734 0.000000
7131 1017734 0.000000
7132 1021458 1.727273
7133 1021458 0.000000
7134 1021458 0.000000
7135 1021458 0.000000
7136 1021458 0.000000
7137 1021458 0.000000
7138 1021458 0.000000
7139 1021458 0.000000
7140 1021458 0.000000
7141 1021459 2.375000
7142 1021459 0.000000
7143 1021459 0.000000
7144 1021459 0.000000
7145 1021459 0.000000
7146 1021459 0.000000
7147 1021459 0.000000
7148 1021459 0.000000
7149 1021460 10.000000
7150 1021460 0.000000
7151 1021460 0.000000
7152 1021460 0.000000
7153 1021460 0.000000
7154 1021460 0.000000
7155 1021460 0.000000
7156 1021460 0.000000
7157 1021460 0.000000
7158 1021460 0.000000
7159 1021460 0.000000
7160 1021460 0.000000
7161 1021460 0.000000
7162 1021460 0.000000
7163 1021460 0.000000
7164 1021460 0.000000
7165 1021460 0.000000
7166 1021460 0.000000
7167 1021461 201.000000
CodePudding user response:
In the first example the shifting is happening before the return. In the second it's happening after, so the data is no longer grouped when the shift happens.
You'll probably want to group again after the mean, so that the shift performs groupwise.
import pandas as pd
df = pd.DataFrame({'group':[1,1,1,2,2,2], 'values':[1,2,3,10,20,30]})
df.groupby("group", as_index=False)["values"].expanding().mean().groupby(level=0).shift().sort_index(level=1).droplevel(0)
Output
0 NaN
1 1.0
2 1.5
3 NaN
4 10.0
5 15.0