How to combine rows of a pandas dataframe as lists based on a condition that rows following a fullst-CodePudding

I have a dataframe which has two columns. I want to merge the rows as lists where the condition is that rows upto fullstop will be one list, rows after that fullstop will be another list. This should reflect on both the columns, however, the condition is based on the first column. For example:

Tokens	label
Comparison	O
of	O
budesonide	I
Turbuhaler	I
with	O
budesonide	I
aqua	I
.	O
Rhinocort	O
Study	O
Group	O
.	O

should yield the following:

Tokens	label
["Comparison","of","budesonide","Turbuhaler","with","budesonide","aqua","."]	["O","O","I","I","O","I","I","O"]
["Rhinocort","Study","Group","."]	["O","O","O","O"]

How do I approach the problem?

CodePudding user response：

Try:

tmp = (df["Tokens"] == ".").astype(int).shift().cumsum().fillna(0)

x = df.groupby(tmp).agg(list).reset_index(drop=True)
print(x)

Prints:

                                                                Tokens                     label
0  [Comparison, of, budesonide, Turbuhaler, with, budesonide, aqua, .]  [O, O, I, I, O, I, I, O]
1                                         [Rhinocort, Study, Group, .]              [O, O, O, O]