Home > Software engineering >  How to combine rows of a pandas dataframe as lists based on a condition that rows following a fullst
How to combine rows of a pandas dataframe as lists based on a condition that rows following a fullst

Time:03-07

I have a dataframe which has two columns. I want to merge the rows as lists where the condition is that rows upto fullstop will be one list, rows after that fullstop will be another list. This should reflect on both the columns, however, the condition is based on the first column. For example:

Tokens label
Comparison O
of O
budesonide I
Turbuhaler I
with O
budesonide I
aqua I
. O
Rhinocort O
Study O
Group O
. O

should yield the following:

Tokens label
["Comparison","of","budesonide","Turbuhaler","with","budesonide","aqua","."] ["O","O","I","I","O","I","I","O"]
["Rhinocort","Study","Group","."] ["O","O","O","O"]

How do I approach the problem?

CodePudding user response:

Try:

tmp = (df["Tokens"] == ".").astype(int).shift().cumsum().fillna(0)

x = df.groupby(tmp).agg(list).reset_index(drop=True)
print(x)

Prints:

                                                                Tokens                     label
0  [Comparison, of, budesonide, Turbuhaler, with, budesonide, aqua, .]  [O, O, I, I, O, I, I, O]
1                                         [Rhinocort, Study, Group, .]              [O, O, O, O]
  • Related