I have a transcribed interview and the data is organized as follows:
[1,] "Interviewer"
[2,] "What is your favorite food?"
[3,] "Interviewee"
[4,] "I love to eat pizza"
[5,] "Interviewer"
[6,] "Cool. But have you ever tried eating salad?"
[7,] "Interviewee "
[8,] "Yeah..."
[9,] "Interviewer"
[10,] "I love salad, pizza is bad."
[11,] "Interviewee "
[12,] "I don't totally agree"
I would like to remove the author of the speech from the rows and turn it into a categorical column, as in the example:
[,1] [,2]
[1,] "Interviewer" "What is your favorite food?"
[2,] "Interviewee" "I love to eat pizza"
[3,] "Interviewer" "Cool. But have you ever tried eating a salad?"
[4,] "Interviewee" "Yeah..."
[5,] "Interviewer" "I love salad, pizza is bad."
[6,] "Interviewee" "I don't totally agree"
The interview considers the conversation between two people. Does anyone know how to do this? Thanks in advance!
CodePudding user response:
Here is an alternative approach:
library(tidyverse)
tibble(v1 = v1) %>%
mutate(v2 = lead(v1)) %>%
filter(row_number() %% 2 == 1) %>%
as.matrix()
v1 v2
[1,] "Interviewer" "What is your favorite food?"
[2,] "Interviewee" "I love to eat pizza"
[3,] "Interviewer" "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."
[5,] "Interviewer" "I love salad, pizza is bad."
[6,] "Interviewee " "I don't totally agree"
CodePudding user response:
We can create a grouping variable with grepl
on the 'Interview' keyword, split
and rbind
do.call(rbind, split(v1, cumsum(grepl("^Interview", v1))))
-output
[,1] [,2]
1 "Interviewer" "What is your favorite food?"
2 "Interviewee" "I love to eat pizza"
3 "Interviewer" "Cool. But have you ever tried eating salad?"
4 "Interviewee " "Yeah..."
5 "Interviewer" "I love salad, pizza is bad."
6 "Interviewee " "I don't totally agree"
If these are alternate elements, then either use a recycling index to create two columns
cbind(v1[c(TRUE, FALSE)], v1[c(FALSE, TRUE)])
[,1] [,2]
[1,] "Interviewer" "What is your favorite food?"
[2,] "Interviewee" "I love to eat pizza"
[3,] "Interviewer" "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."
[5,] "Interviewer" "I love salad, pizza is bad."
[6,] "Interviewee " "I don't totally agree"
Or use matrix
matrix(v1, ncol = 2, byrow = TRUE)
[,1] [,2]
[1,] "Interviewer" "What is your favorite food?"
[2,] "Interviewee" "I love to eat pizza"
[3,] "Interviewer" "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."
[5,] "Interviewer" "I love salad, pizza is bad."
[6,] "Interviewee " "I don't totally agree"
data
v1 <- c("Interviewer", "What is your favorite food?", "Interviewee",
"I love to eat pizza", "Interviewer",
"Cool. But have you ever tried eating salad?",
"Interviewee ", "Yeah...", "Interviewer", "I love salad, pizza is bad.",
"Interviewee ", "I don't totally agree")