Home > Net >  Making a paragraph divided into two paragraphs based on a particular word or word pair
Making a paragraph divided into two paragraphs based on a particular word or word pair

Time:06-14

I have the following data frame (2 column data frame). The column with text will be divided into two columns based on the presence of a word. In this case, the word pair is unit #2. The new data will have a column 2 with the sentences before unit #2 and the new column 3 with the sentences starting with unit #2.

report <- data.frame(Text = c("unit #1 stopped at a stop sign on a road. unit #1 was speeding. unit #2 travelling southbound  in lane #2 of 3 lanes. unit #2 couldn't react in time  and crashed into unit #1. unit #2 was unmindful.", 
                              "unit #1 stopped there. unit #1 was under influence of drug. unit #2 travelling northbound. unit #2 was not unmindful. unit #2 crashed into unit #1.", 
                              "unit #1 was going straight. unit #1 was not speeding. unit #2 travelling southbound  in lane #1 of 2 lanes. unit #2 couldn't react in time and crashed into unit #1. unit #2 was driving fast."), id = 1:3)

CodePudding user response:

You could use tidyr's extract with non-greedy regex:

(Add remove = FALSE if you want to keep column 1.)

library(tidyverse)

report <- data.frame(Text = c(
  "unit #1 stopped at a stop sign on a road. unit #1 was speeding. unit #2 travelling southbound  in lane #2 of 3 lanes. unit #2 couldn't react in time  and crashed into unit #1. unit #2 was unmindful.",
  "unit #1 stopped there. unit #1 was under influence of drug. unit #2 travelling northbound. unit #2 was not unmindful. unit #2 crashed into unit #1.",
  "unit #1 was going straight. unit #1 was not speeding. unit #2 travelling southbound  in lane #1 of 2 lanes. unit #2 couldn't react in time and crashed into unit #1. unit #2 was driving fast."
), id = 1:3)
  
df <- report |> 
  extract(Text, into = c("column 2", "column 3"), regex = "(.*?(?=unit #2))(.*)")

df

#>                                                           column 2
#> 1 unit #1 stopped at a stop sign on a road. unit #1 was speeding. 
#> 2     unit #1 stopped there. unit #1 was under influence of drug. 
#> 3           unit #1 was going straight. unit #1 was not speeding. 
#>                                                                                                                                   column 3
#> 1   unit #2 travelling southbound  in lane #2 of 3 lanes. unit #2 couldn't react in time  and crashed into unit #1. unit #2 was unmindful.
#> 2                                                  unit #2 travelling northbound. unit #2 was not unmindful. unit #2 crashed into unit #1.
#> 3 unit #2 travelling southbound  in lane #1 of 2 lanes. unit #2 couldn't react in time and crashed into unit #1. unit #2 was driving fast.
#>   id
#> 1  1
#> 2  2
#> 3  3

Created on 2022-06-14 by the reprex package (v2.0.1)

  • Related