Merging rows with same username in order-CodePudding

I have a chat dataframe which looks like this

I'm trying to merge chats of the dataframe where same user have replied multiple times to be merged as one. like this

author_username    Content
Denise             I want to die so bad. I don’t feel the need to do anything but
                   with an exam coming up, she threw me away like trash. With all the 
                   pressure, I don’t want to live.
Kenton             Please stay strong, I can feel you. My test just ended next week,
                   back then i feel i don't have hope, and when pandemic first 
                   started. I lost contact With all my friends.
Denise             Oh
Kenton             But look at me now
Denise             I cant see you
Kenton             ? wdym?
Denise             I can't see you
Kenton             I know. That is a sentence that people use to make example of 
                   themself. So I use that sentence
Denise             Ok sry

I have tried

df[['content','author_username']].groupby(['author_username'])['content'].transform(lambda x: ','.join(x))

and this too

for index, row in df.iterrows():
  if index > 0 and row['author_username'] == df.loc[index - 1]['author_username']:
    df.loc[index]['concatenated_message'] = df.loc[index - 1]['concatenated_message']   ' '   row['content']
  else:
    df.loc[index]['concatenated_message'] = row['content']

but it doesn't produce the expected result.

Tried

df3 = df2.groupby(['author_username']).agg({'content': ' '.join})

but it just groups all chats under one name and not inorder.

CodePudding user response：

Setup

print(df)

  author_username content
0           user1       x
1           user1       y
2           user2       z
3           user3       p
4           user3       q
5           user3       r
6           user2       s
7           user2       t

In order to merge the chats in the order of appearance of username you have to compare the current and previous rows in username column to create a boolean mask to identify the boundaries of change then use cumsum on this mask to identify blocks of consecutive chats then simply group the dataframe on these blocks and aggregate

m = df['author_username'] != df['author_username'].shift()
df.groupby(m.cumsum(), as_index=False).agg({'author_username': 'first', 'content': ' '.join})

Result

  author_username content
0           user1     x y
1           user2       z
2           user3   p q r
3           user2     s t