Home > Back-end >  how to group words as a sentence based on speaker # in pandas DataFrame
how to group words as a sentence based on speaker # in pandas DataFrame

Time:08-10

Please consider the following example:

I have a DataFrame

Index Speaker Word
0 spk_0 can
1 spk_0 you
2 spk_0 see
3 spk_0 my
4 spk_0 screen
5 spk_0 now
6 spk_0 ?
7 spk_1 yes
0 spk_1 ,
8 spk_1 now
9 spk_1 I
10 spk_1 can
11 spk_1 see
12 spk_1 your
13 spk_1 screen
14 spk_1 .
15 spk_0 Let
16 spk_0 me
17 spk_0 start
18 spk_0 then
19 spk_2 yes
20 spk_2 sure

I want to combine the Word column such that it should look like the following:

Index Speaker Sentence
0 spk_0 can you see my screen now ?
1 spk_1 yes , now I can see your screen .
2 spk_0 let me start then .
3 spk_2 Yes sure .

Can someone please help me find a solution to this problem? I already had tried group by but didn't work.

CodePudding user response:

You can group by consecutive values of Speaker column created by comapred shifted value with cumulative sum and aggregate join:

g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
  Speaker                               Word
0   spk_0        can you see my screen now ?
1   spk_1  yes , now I can see your screen .
2   spk_0                  Let me start then
3   spk_2                           yes sure
  • Related