How to rearrange continuous text with selected delimiters into columns-CodePudding

I have a vector called chapt1. I want to reorganize this vector into a data frame, df1 such that;

The integers(the ones in parenthesis) beginning each verse is printed in the first column of df1

The adjoining text is printed in the next column on the same row.

> chapt1 <- ("1. The Grand Opening (1) The black cat jumped over the lazy rabbit. (2) Salt has no taste (3) The grandmaster mentors his disciples (4) Generation of miracles. (5) Are we there yet: opening the first stage in the dungeon.")

Result;

    1       The black cat jumped over the lazy rabbit.
    2       Salt has no taste
    3       The grandmaster mentors his disciples.
    4       Generation of miracles.
    5       Are we there yet: opening the first stage in the dungeon.```

Note: This is just a portion of the original file.

CodePudding user response：

We may use

library(stringr)
trimws(str_remove(str_extract_all(chapt1, "\\(\\d \\)[^.\\(] ")[[1]], "^\\(\\d \\)\\s "))

CodePudding user response：

Here is another tidyverse approach: We separate the rows with regex '\\(\\d\\)', remove the first row, filter and use str_squish to remove spaces at beginning and end:

library(tidyverse)

as_tibble(chapt1) %>% 
  separate_rows(value, sep='\\(\\d\\)') %>% 
  filter(row_number() > 1) %>% 
  mutate(value = str_squish(value))

  value                                                    
  <chr>                                                    
1 The black cat jumped over the lazy rabbit.               
2 Salt has no taste                                        
3 The grandmaster mentors his disciples                    
4 Generation of miracles.                                  
5 Are we there yet: opening the first stage in the dungeon.