I have a pandas df as follows:
YEARMONTH UNITS_SOLD
2020-01 5
2020-02 10
2020-03 10
2020-04 5
2020-05 5
2020-06 5
2020-07 10
2020-08 10
I am looking to find the rows where there is 2 or more instances where NUM_UNITS
sold is >=10
and mark the first instance of such an occurrence as True
in a marker
column.
So my result I am trying to achieve is
YEARMONTH UNITS_SOLD Marker
2020-01 5 False
2020-02 10 True
2020-03 10 False
2020-04 5 False
2020-05 5 False
2020-06 5 False
2020-07 10 True
2020-08 10 False
````
Not sure how to proceed here...any inputs will be appreciated.
Thanks!
CodePudding user response:
We can do cumsum
create the sub group key
s = df.UNITS_SOLD.lt(10).cumsum()
out = df[df.UNITS_SOLD.ge(10)].groupby(s)['UNITS_SOLD'].transform('count')
df['Maker'] = df.index.isin(out[out>=2].groupby(s).idxmin())
df
Out[159]:
YEARMONTH UNITS_SOLD Maker
0 2020-01 5 False
1 2020-02 10 True
2 2020-03 10 False
3 2020-04 5 False
4 2020-05 5 False
5 2020-06 5 False
6 2020-07 10 True
7 2020-08 10 False