Converting rows into a categorical column using R-CodePudding

I have a transcribed interview and the data is organized as follows:

[1,]  "Interviewer"
[2,]  "What is your favorite food?"
[3,]  "Interviewee"
[4,]  "I love to eat pizza"
[5,]  "Interviewer"
[6,]  "Cool. But have you ever tried eating salad?"
[7,]  "Interviewee "
[8,]  "Yeah..."
[9,]  "Interviewer"
[10,] "I love salad, pizza is bad."
[11,] "Interviewee "
[12,] "I don't totally agree"

I would like to remove the author of the speech from the rows and turn it into a categorical column, as in the example:

      [,1]                [,2]  
[1,]  "Interviewer"       "What is your favorite food?"
[2,]  "Interviewee"       "I love to eat pizza"
[3,]  "Interviewer"       "Cool. But have you ever tried eating a salad?"
[4,]  "Interviewee"       "Yeah..."
[5,]  "Interviewer"       "I love salad, pizza is bad."
[6,]  "Interviewee"       "I don't totally agree"

The interview considers the conversation between two people. Does anyone know how to do this? Thanks in advance!

CodePudding user response：

Here is an alternative approach:

library(tidyverse)

tibble(v1 = v1) %>% 
  mutate(v2 = lead(v1)) %>% 
  filter(row_number() %% 2 == 1) %>% 
  as.matrix()

     v1             v2                                           
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"

CodePudding user response：

We can create a grouping variable with grepl on the 'Interview' keyword, split and rbind

do.call(rbind, split(v1, cumsum(grepl("^Interview", v1))))

-output

 [,1]           [,2]                                         
1 "Interviewer"  "What is your favorite food?"                
2 "Interviewee"  "I love to eat pizza"                        
3 "Interviewer"  "Cool. But have you ever tried eating salad?"
4 "Interviewee " "Yeah..."                                    
5 "Interviewer"  "I love salad, pizza is bad."                
6 "Interviewee " "I don't totally agree"

If these are alternate elements, then either use a recycling index to create two columns

cbind(v1[c(TRUE, FALSE)], v1[c(FALSE, TRUE)])
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"

Or use matrix

matrix(v1, ncol = 2, byrow = TRUE)
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"

data

v1 <- c("Interviewer", "What is your favorite food?", "Interviewee", 
"I love to eat pizza", "Interviewer", 
"Cool. But have you ever tried eating salad?", 
"Interviewee ", "Yeah...", "Interviewer", "I love salad, pizza is bad.", 
"Interviewee ", "I don't totally agree")