I have a dataframe that has one column only like the following.(a minimal example)
import pandas as pd
dataframe =pd.DataFrame({'text': ['##weather','how is today?', 'we go out', '##rain',
'my day is rainy', 'I am not feeling well','rainy
blues','##flower','the blue flower', 'she likes red',
'this flower is nice']})
I would like to add a second column called 'id' and increment every time the row contains '##'. so my desired output would be,
text id
0 ##weather 100
1 how is today? 100
2 we go out 100
3 ##rain 101
4 my day is rainy 101
5 I am not feeling well 101
6 rainy blues 101
7 ##flower 102
8 the blue flower 102
9 she likes red 102
10 this flower is nice 102
so far i have done the following which does not return the right output as i want.
dataframe['id']= 100
dataframe.loc[dataframe['text'].str.contains('## intent:'), 'id'] = 1
CodePudding user response:
You can try groupby
with ngroup
m = dataframe['text'].str.contains('##').cumsum()
dataframe['id'] = dataframe.groupby(m).ngroup() 100
print(dataframe)
text id
0 ##weather 100
1 how is today? 100
2 we go out 100
3 ##rain 101
4 my day is rainy 101
5 I am not feeling well 101
6 rainy 101
7 blues 101
8 ##flower 102
9 the blue flower 102
10 she likes red 102
11 this flower is nice 102