I have a Pandas series of random numbers from -1 to 1:
from pandas import Series
from random import random
x = Series([random() * 2 - 1. for i in range(1000)])
x
Output:
0 -0.499376
1 -0.386884
2 0.180656
3 0.014022
4 0.409052
...
995 -0.395711
996 -0.844389
997 -0.508483
998 -0.156028
999 0.002387
Length: 1000, dtype: float64
I can get the rolling standard deviation of the full Series easily:
x.rolling(30).std()
Output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
995 0.575365
996 0.580220
997 0.580924
998 0.577202
999 0.576759
Length: 1000, dtype: float64
But what I would like to do is to get the standard deviation of only positive numbers within the rolling window. In our example, the window length is 30... say there are only 15 positive numbers in the window, I want the standard deviation of only those 15 numbers.
One could remove all negative numbers from the Series and calculate the rolling standard deviation:
x[x > 0].rolling(30).std()
Output:
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
...
988 0.286056
990 0.292455
991 0.283842
994 0.291798
999 0.291824
Length: 504, dtype: float64
...But this isn't the same thing, as there will always be 30 positive numbers in the window here, whereas for what I want, the number of positive numbers will change.
I want to avoid iterating over the Series; I was hoping there might be a more Pythonic way to solve my problem. Can anyone help ?
CodePudding user response:
Mask the non positive values with NaN
then calculate the rolling std
with min_periods=1
and optionally set the first 29
values to NaN
.
w = 30
s = x.mask(x <= 0).rolling(w, min_periods=1).std()
s.iloc[:w - 1] = np.nan
Note
Passing the argument min_periods=1
is important here because there can be certain windows where the number of non-null values is not equal to length of that window and in such case you will get the NaN result.
CodePudding user response:
You may first turn non-positive values into np.nan
, then apply np.nanstd
to each window. So
x[x.values <= 0] = np.nan
rolling_list = [np.nanstd(window.to_list()) for window in x.rolling(window=30)]
will return
[0.0,
0.0,
0.38190115685808856,
0.38190115685808856,
0.38190115685808856,
0.3704840425749437,
0.33234158296550925,
0.33234158296550925,
0.3045579286056045,
0.2962826377559198,
0.275920580105683,
0.29723758167880554,
0.29723758167880554,
0.29723758167880554,
0.29723758167880554,
0.29723758167880554
...]
CodePudding user response:
IIUC, after rolling
, you want to calculate std of only positive values in each rolling window
out = x.rolling(30).apply(lambda w: w[w>0].std())
print(out)
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
995 0.324031
996 0.298276
997 0.294917
998 0.304506
999 0.308050
Length: 1000, dtype: float64
CodePudding user response:
Another possible solution:
pd.Series(np.where(x >= 0, x, np.nan)).rolling(30, min_periods=1).std()
Output:
0 NaN
1 NaN
2 NaN
3 0.441567
4 0.312562
...
995 0.323768
996 0.312461
997 0.304077
998 0.308342
999 0.301742
Length: 1000, dtype: float64