When creating a function, and using rolling( ) with the apply( ) to calculate a rolling 3 day percentile distribution, it is displaying 0's after the first 3 days for the rest of the Column.
I'm assuming that the first 2 days which have NaN Values are not being used in the calculation of the percentile function, and therefore maybe defaulting the rest of the columns to Zero, and incorrectly giving the 33 value for the third day. But im not sure about this.
I have been trying to solve this, but have not got any solution. Does anybody know why and how to solve correct this code below ? it would be greatly appreciated.
import pandas as pd
import numpy as np
from scipy import stats
data = { 'a': [1, 15, 27, 399, 17, 568, 200, 9],
'b': [2, 30, 15, 60, 15, 80, 53, 41],
'c': [100,200, 3, 78, 25, 88, 300, 91],
'd': [4, 300, 400, 500, 23, 43, 9, 71]
}
dfgrass = pd.DataFrame(data)
def percnum(x):
for t in dfgrass.index:
aaa = (x<=dfgrass.loc[t,'b']).value_counts()
ccc = (x<=dfgrass.loc[t, 'b']).values.sum()
vvv = len(x)
nnn = ccc/ vvv
return nnn * 100
dfgrass['e'] = dfgrass['b'].rolling(window=3).apply(percnum)
print(dfgrass)
CodePudding user response:
Perhaps try changing for t in dfgrass.index
to for t in x.index
in your implementation of def percnum(x)
like so:
def percnum(x):
for t in x.index:
aaa = (x<=dfgrass.loc[t,'b']).value_counts()
ccc = (x<=dfgrass.loc[t, 'b']).values.sum()
vvv = len(x)
nnn = ccc/ vvv
return nnn * 100
CodePudding user response:
If you're trying to compute the percentile ranks, then you can try something like
def percnum(x):
n = len(x)
temp = x.argsort()
ranks = np.empty(n)
ranks[temp] = (np.arange(n) 1) / n
return ranks[-1]
dfgrass.rolling(3).apply(percnum)
which gives the following output
a b c d
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 1.000000 0.666667 0.333333 1.000000
3 1.000000 1.000000 0.666667 1.000000
4 0.333333 0.666667 0.666667 0.333333
5 1.000000 1.000000 1.000000 0.666667
6 0.666667 0.666667 1.000000 0.333333
7 0.333333 0.333333 0.666667 1.000000