I have a dataframe:
d= {'page_number':[0,0,0,0,0,0,1,1,1,1], 'text':[aa,ii,cc,dd,ee,ff,gg,hh,ii,jj]}
df = pd.DataFrame(data=d)
df
page_number text
0 0 aa
1 0 ii
2 0 cc
3 0 dd
4 0 ee
5 0 ff
6 1 gg
7 1 hh
8 1 ii
9 1 jj
I want to spot the page_numer where 'gg' appears, now on the same page_number there can be many different substrings, but I'm interested in extracting the row number of where 'ii' appears on the same page_number of 'gg' (not interested in getting results of other 'ii' substrings appearances)
idx=np.where(df['text'].str.contains(r'gg', na=True))[0][0]
won't necessarily help here as it retrieves the row number of 'gg' but not its 'page_number'.
Many thanks
CodePudding user response:
You first leave only 'ii' and 'gg' appearances:
df = df[df['text'].isin(['ii', 'gg'])
Then by groupby page number we can assume that when ever we got 2 then they are on the same page:
df2 = df.groupby('page_number').count()
df2[df2['text'] == 2]
CodePudding user response:
You can use pandas to retrieve column value on the basis of another column value. I hope this will retrieve what you are looking for.
df[df['text']=='gg']['page_number']