I have a dataframe which has two columns. I want to merge the rows as lists where the condition is that rows upto fullstop will be one list, rows after that fullstop will be another list. This should reflect on both the columns, however, the condition is based on the first column. For example:
Tokens | label |
---|---|
Comparison | O |
of | O |
budesonide | I |
Turbuhaler | I |
with | O |
budesonide | I |
aqua | I |
. | O |
Rhinocort | O |
Study | O |
Group | O |
. | O |
should yield the following:
Tokens | label |
---|---|
["Comparison","of","budesonide","Turbuhaler","with","budesonide","aqua","."] | ["O","O","I","I","O","I","I","O"] |
["Rhinocort","Study","Group","."] | ["O","O","O","O"] |
How do I approach the problem?
CodePudding user response:
Try:
tmp = (df["Tokens"] == ".").astype(int).shift().cumsum().fillna(0)
x = df.groupby(tmp).agg(list).reset_index(drop=True)
print(x)
Prints:
Tokens label
0 [Comparison, of, budesonide, Turbuhaler, with, budesonide, aqua, .] [O, O, I, I, O, I, I, O]
1 [Rhinocort, Study, Group, .] [O, O, O, O]