The output dataframe should look like Op_dataframe

CodePudding user response：

Use a custom groupby aggregation with groupby.agg and cumsum to generate the common grouper

m = df['options'].eq('Stem')

out = (df.groupby(m.cumsum().astype(str).radd('Stem'))
         .agg(All_4_Options_Appended=('text', ';'.join))
         .rename_axis('Stems').reset_index()
       )

Output:

   Stems                             All_4_Options_Appended
0  Stem1  It's the beginning of the quarter, and you're ...
1  Stem2  It's the beginning of the quarter, and you're ...

CodePudding user response：

The main trick here is that after you are able to create g which creates a grouping column/series for required rows, you combine all the text values in each group as a list. Then you can combine them with a vectorized method .str.join(' ').

This method should be faster than .agg or .apply methods

Try the following. (Step by step - explanation mentioned in comments) -

s = 'Stem'                                            # Start group for string
g = df['options'].eq(s).cumsum()                      # Create groups based cumsum
o = df.groupby(g)['text'].apply(list).str.join(' ')   # Groupby and combine text to list of texts
o = o.reset_index()                                   # Reset index to get group column
o['options'] = s   o['options'].astype(str)           # Prefix column with Stem
o.columns = ['Stems','All_4_options_Appended']        # Change column names
print(o)

   Stems                             All_4_options_Appended
0  Stem1  It's the beginning of the quarter, and you're ...
1  Stem2  It's the beginning of the quarter, and you're ...

Benchmarks

Solution by @Akshay Sehgal

%%timeit

s = 'Stem'
g = df['options'].eq(s).cumsum()
o = df.groupby(g)['text'].apply(list).str.join(' ')
o = o.reset_index()
o['options'] = s   o['options'].astype(str)
o.columns = ['Stems','All_4_options_Appended']
o

#686 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Solution by @Mozway

%%timeit

m = df['options'].eq('Stem')

out = (df.groupby(m.cumsum().astype(str).radd('Stem'))
         .agg(All_4_Options_Appended=('text', ';'.join))
         .rename_axis('Stems').reset_index()
       )

out

#1.44 ms ± 8.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)