Home > Software design >  finding consecutive values greater than a threshold
finding consecutive values greater than a threshold

Time:04-09

I have a pandas df as follows:

YEARMONTH   UNITS_SOLD
     
 2020-01      5
 2020-02      10
 2020-03      10
 2020-04      5
 2020-05      5
 2020-06      5
 2020-07      10
 2020-08      10

I am looking to find the rows where there is 2 or more instances where NUM_UNITS sold is >=10 and mark the first instance of such an occurrence as True in a marker column. So my result I am trying to achieve is


YEARMONTH   UNITS_SOLD    Marker
     
 2020-01      5           False
 2020-02      10          True
 2020-03      10          False
 2020-04      5           False
 2020-05      5           False
 2020-06      5           False
 2020-07      10          True
 2020-08      10          False 
````
Not sure how to proceed here...any inputs will be appreciated.
Thanks!

CodePudding user response:

We can do cumsum create the sub group key

s = df.UNITS_SOLD.lt(10).cumsum()
out = df[df.UNITS_SOLD.ge(10)].groupby(s)['UNITS_SOLD'].transform('count')
df['Maker'] = df.index.isin(out[out>=2].groupby(s).idxmin())
df
Out[159]: 
  YEARMONTH  UNITS_SOLD  Maker
0   2020-01           5  False
1   2020-02          10   True
2   2020-03          10  False
3   2020-04           5  False
4   2020-05           5  False
5   2020-06           5  False
6   2020-07          10   True
7   2020-08          10  False
  • Related