May I know how to ignore NaN
when performing rolling
on a df
.
For example, given a df
, perform rolling on column a
, but ignore the Nan
. This requirement should produced something
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 11180.00
5 13426.0 12050.00
6 NaN NaN
7 17514.0 19350.00
8 18408.0 20142.50
9 22128.0 20142.50
10 22520.0 21018.67
11 NaN NaN
12 26164.0 27796.67
13 26590.0 21627.25
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
However, I dont know which part of this line should be tweaked to get the above expected output
df['a'].rolling(2 * w 1, center=True, min_periods=1).mean()
Currently, the following code
import numpy as np
import pandas as pd
arr=[[6772],[7182],[8570],[11078],[11646],[13426],[np.nan],[17514],[18408],
[22128],[22520],[np.nan],[26164],[26590],[30636],[3119],[32166],[34774]]
df=pd.DataFrame(arr,columns=['a'])
w = 2
df['avg'] = df['a'].rolling(2 * w 1, center=True, min_periods=1).mean()
produced the following,
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 11180.00
5 13426.0 13416.00 <<<
6 NaN 15248.50 <<<
7 17514.0 17869.00 <<<
8 18408.0 20142.50
9 22128.0 20142.50
10 22520.0 22305.00 <<<
11 NaN 24350.50 <<<
12 26164.0 26477.50 <<<
13 26590.0 21627.25
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
<<<
indicate where the values are different than the expected output
Update:
adding fillna
df['avg'] = df['a'].fillna(value=0).rolling(2 * w 1, center=True, min_periods=1).mean()
Does not produced the expected output
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 8944.00
5 13426.0 10732.80
6 NaN 12198.80
7 17514.0 14295.20
8 18408.0 16114.00
9 22128.0 16114.00
10 22520.0 17844.00
11 NaN 19480.40
12 26164.0 21182.00
13 26590.0 17301.80
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
12050=sum(11078 11646 13426 )/3
CodePudding user response:
IIUC, you want to restart the rolling when nan
is met. One way would be to use pandas.DataFrame.groupby
:
m = df.isna().any(1)
df["avg"] = (df["a"].groupby(m.cumsum())
.rolling(2 * w 1, center=True, min_periods=1).mean()
.reset_index(level=0, drop=True))
df["avg"] = df["avg"][~m]
Output:
a avg
0 6772.0 7508.000000
1 7182.0 8400.500000
2 8570.0 9049.600000
3 11078.0 10380.400000
4 11646.0 11180.000000
5 13426.0 12050.000000
6 NaN NaN
7 17514.0 19350.000000
8 18408.0 20142.500000
9 22128.0 20142.500000
10 22520.0 21018.666667
11 NaN NaN
12 26164.0 27796.666667
13 26590.0 21627.250000
14 30636.0 23735.000000
15 3119.0 25457.000000
16 32166.0 25173.750000
17 34774.0 23353.000000