Home > Software design >  Sorting text data into variables and rows in R
Sorting text data into variables and rows in R

Time:05-25

I have text data in a string variable that I want to arrange into TEXT, WHO and TIME variables in R.

The data is structured so it is possible to apply these rules:

  1. Grep text until "PersonA |" or "PersonB |" and add to TEXT variable
  2. Add PersonA/PersonB to WHO variable
  3. Add date to TIME variable
example_data <- "how are you? what, are u thinking about. anything! PersonA | 2020-03-20 3:49\nI'm fine thanks PersonB | 2020-03-20 3:49\nWhat are you doing? PersonA | 2020-03-20 3:50\nPlaying card PersonB | 2020-03-20 3:49\n"

The data is structured so it is possible to apply these rules:

  1. Grep text until "PersonA |" or "PersonB |" and add to TEXT variable
  2. Add PersonA/PersonB to WHO variable
  3. Add date to TIME variable
# Desired output
TEXT <- c("how are you?", "I'm fine thanks", "What are you doing?", "Playing card")
WHO <- c("PersonA", "PersonB", "PersonA", "PersonB")
TIME <- c("2020-03-20 3:49", "2020-03-20 3:49", "2020-03-20 3:50", "2020-03-20 3:49")

output <- data.frame(TEXT, WHO, TIME)
output

CodePudding user response:

An option is to replace the delimiters with a single delimiter (;) and then read with read.csv from base R

read.csv(text = gsub("\\s(?=Person)|\\s \\|\\s ", ";", example_data, 
   perl = TRUE), sep = ";", header = FALSE, col.names = c("TEXT", "WHO", "TIME"))

-output

                                                TEXT     WHO            TIME
1 how are you? what, are u thinking about. anything! PersonA 2020-03-20 3:49
2                                    I'm fine thanks PersonB 2020-03-20 3:49
3                                What are you doing? PersonA 2020-03-20 3:50
4                                       Playing card PersonB 2020-03-20 3:49
  •  Tags:  
  • r
  • Related