pandas: increment based on a condition in another column-CodePudding

I have a dataframe that has one column only like the following.(a minimal example)

import pandas as pd

dataframe =pd.DataFrame({'text': ['##weather','how is today?', 'we go out', '##rain',
                     'my day is rainy', 'I am not feeling well','rainy 
                    blues','##flower','the blue flower', 'she likes red',
                    'this flower is nice']})

I would like to add a second column called 'id' and increment every time the row contains '##'. so my desired output would be,

                    text  id
0              ##weather  100
1          how is today?  100
2              we go out  100
3                 ##rain  101
4        my day is rainy  101
5  I am not feeling well  101
6            rainy blues  101
7                ##flower 102
8         the blue flower 102
9           she likes red 102
10    this flower is nice 102

so far i have done the following which does not return the right output as i want.

dataframe['id']= 100
dataframe.loc[dataframe['text'].str.contains('## intent:'), 'id']  = 1

CodePudding user response：

You can try groupby with ngroup

m = dataframe['text'].str.contains('##').cumsum()

dataframe['id'] = dataframe.groupby(m).ngroup()   100

print(dataframe)

                     text   id
0               ##weather  100
1           how is today?  100
2               we go out  100
3                  ##rain  101
4         my day is rainy  101
5   I am not feeling well  101
6                   rainy  101
7                   blues  101
8                ##flower  102
9         the blue flower  102
10          she likes red  102
11    this flower is nice  102