How can I efficiently calculate a rolling mean excluding zeros for a given time window?
My try:
window = 5
df['rollMeanColumn'] = df.loc[:, 'Column'].rolling(window).apply(lambda x: x[x!=0].mean())
This code seems to work but for longer datasets is extremely slow
Can I improve it?
CodePudding user response:
here is one way to do it. If you post the data to reproduce, i would have posted the result set.
window=5
df[df['Column']!=0]['Column'].rolling(window).mean()
CodePudding user response:
import pandas as pd
import numpy as np
rng = np.random.default_rng(92)
df = pd.DataFrame({'a':rng.integers(0,5, 10),
'b':rng.integers(0,5, 10),
'c':rng.integers(0,5, 10)})
df
###
a b c
0 2 3 1
1 3 4 0
2 4 1 1
3 0 0 1
4 2 3 3
5 1 0 2
6 2 2 2
7 1 3 2
8 3 0 3
9 0 0 2
df['rollMeanColumn_a'] = df[df['a'] != 0]['a'].rolling(window=3).mean()
df['rollMeanColumn_b'] = df['b'].replace(0,np.nan).dropna().rolling(window=3).mean()
df['rollMeanColumn_c'] = df.query('c != 0')['c'].rolling(3).mean()
df
###
a b c rollMeanColumn_a rollMeanColumn_b rollMeanColumn_c
0 2 3 1 NaN NaN NaN
1 3 4 0 NaN NaN NaN
2 4 1 1 3.000000 2.666667 NaN
3 0 0 1 NaN NaN 1.000000
4 2 3 3 3.000000 2.666667 1.666667
5 1 0 2 2.333333 NaN 2.000000
6 2 2 2 1.666667 2.000000 2.333333
7 1 3 2 1.333333 2.666667 2.000000
8 3 0 3 2.000000 NaN 2.333333
9 0 0 2 NaN NaN 2.333333