Pandas series: conditional rolling standard deviation-CodePudding

I have a Pandas series of random numbers from -1 to 1:

from pandas import Series
from random import random

x = Series([random() * 2 - 1. for i in range(1000)])
x

Output:
  0    -0.499376
  1    -0.386884
  2     0.180656
  3     0.014022
  4     0.409052
  ...
  995  -0.395711
  996  -0.844389
  997  -0.508483
  998  -0.156028
  999   0.002387
  Length: 1000, dtype: float64

I can get the rolling standard deviation of the full Series easily:

x.rolling(30).std()

Output:
  0     NaN
  1     NaN
  2     NaN
  3     NaN
  4     NaN
  ...
  995   0.575365
  996   0.580220
  997   0.580924
  998   0.577202
  999   0.576759
  Length: 1000, dtype: float64

But what I would like to do is to get the standard deviation of only positive numbers within the rolling window. In our example, the window length is 30... say there are only 15 positive numbers in the window, I want the standard deviation of only those 15 numbers.

One could remove all negative numbers from the Series and calculate the rolling standard deviation:

x[x > 0].rolling(30).std()

Output:
  2     NaN
  3     NaN
  4     NaN
  5     NaN
  6     NaN
  ...  
  988   0.286056
  990   0.292455
  991   0.283842
  994   0.291798
  999   0.291824
  Length: 504, dtype: float64

...But this isn't the same thing, as there will always be 30 positive numbers in the window here, whereas for what I want, the number of positive numbers will change.

I want to avoid iterating over the Series; I was hoping there might be a more Pythonic way to solve my problem. Can anyone help ?

CodePudding user response：

Mask the non positive values with NaN then calculate the rolling std with min_periods=1 and optionally set the first 29 values to NaN.

w = 30
s = x.mask(x <= 0).rolling(w, min_periods=1).std()
s.iloc[:w - 1] = np.nan

Note

Passing the argument min_periods=1 is important here because there can be certain windows where the number of non-null values is not equal to length of that window and in such case you will get the NaN result.

CodePudding user response：

You may first turn non-positive values into np.nan, then apply np.nanstd to each window. So

x[x.values <= 0] = np.nan
rolling_list = [np.nanstd(window.to_list()) for window in x.rolling(window=30)]

will return

[0.0,
 0.0,
 0.38190115685808856,
 0.38190115685808856,
 0.38190115685808856,
 0.3704840425749437,
 0.33234158296550925,
 0.33234158296550925,
 0.3045579286056045,
 0.2962826377559198,
 0.275920580105683,
 0.29723758167880554,
 0.29723758167880554,
 0.29723758167880554,
 0.29723758167880554,
 0.29723758167880554
...]

CodePudding user response：

IIUC, after rolling, you want to calculate std of only positive values in each rolling window

out = x.rolling(30).apply(lambda w: w[w>0].std())

print(out)

0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
         ...
995    0.324031
996    0.298276
997    0.294917
998    0.304506
999    0.308050
Length: 1000, dtype: float64

CodePudding user response：

Another possible solution:

pd.Series(np.where(x >= 0, x, np.nan)).rolling(30, min_periods=1).std()

Output:

0           NaN
1           NaN
2           NaN
3      0.441567
4      0.312562
         ...   
995    0.323768
996    0.312461
997    0.304077
998    0.308342
999    0.301742
Length: 1000, dtype: float64