I am beginning the analysis in RStudio of an interview I have made. The interview is, normally, made of the interviewer's questions and the subject's answers.
text<- "Interviewer: Hello, how are you?
Subject: I am fine, thanks.
Interviewer: What is your name?
Subject: My name is Gerard."
I would like to remove all the interviewer's questions to be able to analyze the interview. I do not know how to proceed in R, actually, I do not even know what exactly to Google.
I would appreciate all the help I can get. Thank you in advance.
CodePudding user response:
base R:
text<- "Interviewer: Hello, how are you?
Subject: I am fine, thanks.
Interviewer: What is your name?
Subject: My name is Gerard."
this gives you
text
[1] "Interviewer: Hello, how are you?\nSubject: I am fine, thanks.\n\nInterviewer: What is your name?\nSubject: My name is Gerard."
where the \n are that you split on with strsplit(
strsplit(text, '\n')[[1]] # strsplit returns a list
[1] "Interviewer: Hello, how are you?" "Subject: I am fine, thanks."
[3] "" "Interviewer: What is your name?"
[5] "Subject: My name is Gerard."
text2 <- strsplit(text, '\n\)
text2[c(2,5)]
[1] "Subject: I am fine, thanks." "Subject: My name is Gerard."
CodePudding user response:
If your data is a vector text
as indicated in the question, we can do:
It seems that your data is stored in text
-> then try this:
With as_tibble wit transform the vector to a tibble ( /- equal to data frame), then we separate the rows by \n
and finally we filte:
library(dplyr)
library(tidyr)
text <- as_tibble(text) %>%
separate_rows(value, sep="\n") %>%
filter(!grepl("Interviewer", value) & value!="") %>%
pull(value)
text
[1] "Subject: I am fine, thanks." "Subject: My name is Gerard."
CodePudding user response:
An approach using strsplit
and sub
/gsub
.
text_new <- gsub("\n", "", sub(".*(Subject: )", "\\1",
unlist(strsplit(text, "Interviewer: "))))
text_new[nchar(text_new) > 0]
[1] "Subject: I am fine, thanks." "Subject: My name is Gerard."
- First split the string using Interviewer:.
- Since the first string includes Subject: remove the residual string until Subject: with
sub
- Remove existing newlines with
gsub
. - Finally select non-empty strings.