I'm trying to use np.where to count consecutive non-NaN values longer than a certain length as shown below:
e.g. If there are more than 3 consecutive non-NaN values then return True.
Would appreciate any help!
value | consecutive |
---|---|
nan | False |
nan | False |
1 | False |
1 | False |
nan | False |
4 | True |
2 | True |
3 | True |
nan | False |
nan | False |
1 | True |
3 | True |
3 | True |
5 | True |
CodePudding user response:
The idea is to create groups by testing missing values and mapping using Series.map
with Series.value_counts
to have only rows with non NaNs filtered by inverted mask ~m
:
#convert values to numeric
df['value'] = df['value'].astype(float)
m = df['value'].isna()
s = m.cumsum()
N = 3
df['new'] = s.map(s[~m].value_counts()).ge(N) & ~m
print (df)
value consecutive new
0 NaN False False
1 NaN False False
2 1.0 False False
3 1.0 False False
4 NaN False False
5 4.0 True True
6 2.0 True True
7 3.0 True True
8 NaN False False
9 NaN False False
10 1.0 True True
11 3.0 True True
12 3.0 True True
13 5.0 True True