I have a data frame with lines of a transcription of a conversation, in which what was said by each person is separated by an empty line. I now need to aggregate the lines so that each one is a row, but the line ranges are irregular. How can I aggregate this data?
The data are like this:
Speech | Sep line |
---|---|
Was in Augoust | 0 |
Don't you remember? | 0 |
1 | |
Yes, i did | 0 |
It was a hot Saturday | 0 |
we were in the park | 0 |
1 | |
That's right | 0 |
it was a fun day | 0 |
I want the date to be like:
speech |
---|
Was in Augoust, Don't you remember? |
Yes, i did. It was a hot Saturday, we were in the park |
That's right,it was a fun day |
CodePudding user response:
Here's a way with dplyr
-
df %>%
mutate(group = cumsum(sep_line)) %>%
filter(sep_line == 0) %>%
group_by(group) %>%
summarise(
speech = paste(speech, collapse = " ")
) %>%
select(speech)