I have a dataframe as shown below:
id date notify
3 04/09/2019 no
3 30/10/2019 yes
3 03/05/2020 no
3 05/09/2020 no
3 31/10/2020 yes
3 03/11/2020 no
5 03/09/2019 no
5 27/10/2019 yes
5 02/05/2020 no
I want to create a counter group number for every time a "notify" is "yes". I then want to apply the same number to the following row where "notify" will always be "no".
It should look like this:
id date notify time_group
3 04/09/2019 no
3 30/10/2019 yes 1
3 03/05/2020 no 1
3 05/09/2020 no
3 31/10/2020 yes 2
3 03/11/2020 no 2
5 03/09/2019 no
5 27/10/2019 yes 3
5 02/05/2020 no 3
At the moment I have tried this without much success:
i = 0
df['time_grp'] = np.nan
for row in df.iterrows():
if row['notify'] == 'yes':
row['time_group'] = i
i = 1
I was wondering if there is a more pandas friendly way of accomplishing this? Perhaps making use of cumcount()? I know I can then probably use shift(-1) for applying the same counter group number to the next row...
CodePudding user response:
Try:
# mark the `yes` rows
s = df['notify'].eq('yes')
# s.cumsum() enumerate the blocks
# maybe `s.groupby(df['id']).cumsum() if enumeration within id
df['time_group'] = s.cumsum().where( # use `where` to keep
s | # the `yes` rows
s.groupby(df['id']).shift(fill_value=False) # and those after
)