I have text data in a string variable that I want to arrange into TEXT, WHO and TIME variables in R.
The data is structured so it is possible to apply these rules:
- Grep text until "PersonA |" or "PersonB |" and add to TEXT variable
- Add PersonA/PersonB to WHO variable
- Add date to TIME variable
example_data <- "how are you? what, are u thinking about. anything! PersonA | 2020-03-20 3:49\nI'm fine thanks PersonB | 2020-03-20 3:49\nWhat are you doing? PersonA | 2020-03-20 3:50\nPlaying card PersonB | 2020-03-20 3:49\n"
The data is structured so it is possible to apply these rules:
- Grep text until "PersonA |" or "PersonB |" and add to TEXT variable
- Add PersonA/PersonB to WHO variable
- Add date to TIME variable
# Desired output
TEXT <- c("how are you?", "I'm fine thanks", "What are you doing?", "Playing card")
WHO <- c("PersonA", "PersonB", "PersonA", "PersonB")
TIME <- c("2020-03-20 3:49", "2020-03-20 3:49", "2020-03-20 3:50", "2020-03-20 3:49")
output <- data.frame(TEXT, WHO, TIME)
output
CodePudding user response:
An option is to replace the delimiters with a single delimiter (;
) and then read with read.csv
from base R
read.csv(text = gsub("\\s(?=Person)|\\s \\|\\s ", ";", example_data,
perl = TRUE), sep = ";", header = FALSE, col.names = c("TEXT", "WHO", "TIME"))
-output
TEXT WHO TIME
1 how are you? what, are u thinking about. anything! PersonA 2020-03-20 3:49
2 I'm fine thanks PersonB 2020-03-20 3:49
3 What are you doing? PersonA 2020-03-20 3:50
4 Playing card PersonB 2020-03-20 3:49