Home > Back-end >  How to ignore NaN when applying rolling with Pandas
How to ignore NaN when applying rolling with Pandas

Time:09-22

May I know how to ignore NaN when performing rolling on a df.

For example, given a df, perform rolling on column a, but ignore the Nan. This requirement should produced something

          a       avg
0    6772.0   7508.00
1    7182.0   8400.50
2    8570.0   9049.60
3   11078.0  10380.40
4   11646.0  11180.00
5   13426.0  12050.00
6       NaN  NaN
7   17514.0  19350.00
8   18408.0  20142.50
9   22128.0  20142.50
10  22520.0  21018.67
11      NaN  NaN 
12  26164.0  27796.67
13  26590.0  21627.25
14  30636.0  23735.00
15   3119.0  25457.00
16  32166.0  25173.75
17  34774.0  23353.00

However, I dont know which part of this line should be tweaked to get the above expected output

df['a'].rolling(2 * w   1, center=True, min_periods=1).mean()

Currently, the following code

import numpy as np
import pandas as pd
arr=[[6772],[7182],[8570],[11078],[11646],[13426],[np.nan],[17514],[18408],
[22128],[22520],[np.nan],[26164],[26590],[30636],[3119],[32166],[34774]]
df=pd.DataFrame(arr,columns=['a'])
w = 2
df['avg'] = df['a'].rolling(2 * w   1, center=True, min_periods=1).mean()

produced the following,

 a       avg
0    6772.0   7508.00
1    7182.0   8400.50
2    8570.0   9049.60
3   11078.0  10380.40
4   11646.0  11180.00
5   13426.0  13416.00   <<<
6       NaN  15248.50   <<<
7   17514.0  17869.00   <<<
8   18408.0  20142.50
9   22128.0  20142.50
10  22520.0  22305.00   <<<
11      NaN  24350.50   <<<
12  26164.0  26477.50   <<<
13  26590.0  21627.25
14  30636.0  23735.00
15   3119.0  25457.00
16  32166.0  25173.75
17  34774.0  23353.00

<<< indicate where the values are different than the expected output

Update:

adding fillna

df['avg'] = df['a'].fillna(value=0).rolling(2 * w   1, center=True, min_periods=1).mean()

Does not produced the expected output

          a       avg
0    6772.0   7508.00
1    7182.0   8400.50
2    8570.0   9049.60
3   11078.0  10380.40
4   11646.0   8944.00
5   13426.0  10732.80
6       NaN  12198.80
7   17514.0  14295.20
8   18408.0  16114.00
9   22128.0  16114.00
10  22520.0  17844.00
11      NaN  19480.40
12  26164.0  21182.00
13  26590.0  17301.80
14  30636.0  23735.00
15   3119.0  25457.00
16  32166.0  25173.75
17  34774.0  23353.00

12050=sum(11078 11646 13426 )/3

CodePudding user response:

IIUC, you want to restart the rolling when nan is met. One way would be to use pandas.DataFrame.groupby:

m = df.isna().any(1)

df["avg"] = (df["a"].groupby(m.cumsum())
                    .rolling(2 * w   1, center=True, min_periods=1).mean()
                    .reset_index(level=0, drop=True))
df["avg"] = df["avg"][~m]

Output:

          a           avg
0    6772.0   7508.000000
1    7182.0   8400.500000
2    8570.0   9049.600000
3   11078.0  10380.400000
4   11646.0  11180.000000
5   13426.0  12050.000000
6       NaN           NaN
7   17514.0  19350.000000
8   18408.0  20142.500000
9   22128.0  20142.500000
10  22520.0  21018.666667
11      NaN           NaN
12  26164.0  27796.666667
13  26590.0  21627.250000
14  30636.0  23735.000000
15   3119.0  25457.000000
16  32166.0  25173.750000
17  34774.0  23353.000000
  • Related