Please consider the following example:
I have a DataFrame
Index | Speaker | Word |
---|---|---|
0 | spk_0 | can |
1 | spk_0 | you |
2 | spk_0 | see |
3 | spk_0 | my |
4 | spk_0 | screen |
5 | spk_0 | now |
6 | spk_0 | ? |
7 | spk_1 | yes |
0 | spk_1 | , |
8 | spk_1 | now |
9 | spk_1 | I |
10 | spk_1 | can |
11 | spk_1 | see |
12 | spk_1 | your |
13 | spk_1 | screen |
14 | spk_1 | . |
15 | spk_0 | Let |
16 | spk_0 | me |
17 | spk_0 | start |
18 | spk_0 | then |
19 | spk_2 | yes |
20 | spk_2 | sure |
I want to combine the Word column such that it should look like the following:
Index | Speaker | Sentence |
---|---|---|
0 | spk_0 | can you see my screen now ? |
1 | spk_1 | yes , now I can see your screen . |
2 | spk_0 | let me start then . |
3 | spk_2 | Yes sure . |
Can someone please help me find a solution to this problem? I already had tried group by but didn't work.
CodePudding user response:
You can group by consecutive values of Speaker
column created by comapred shifted value with cumulative sum and aggregate join
:
g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
Speaker Word
0 spk_0 can you see my screen now ?
1 spk_1 yes , now I can see your screen .
2 spk_0 Let me start then
3 spk_2 yes sure