Home > Software design >  Extracting sentence from multiple paragraphs
Extracting sentence from multiple paragraphs

Time:11-01

I need to extract the first sentence from every paragraph in a written text. I also need to preserve the paragraph structure so that the first sentence is its own paragraph.

I need to use R for this one.

I know I have to add a loop function, but I don't know how to.

Thanks a lot, guys.

CodePudding user response:

Suppose that every sentences are split with . and every paragraphs are split with \n. For example,

dummy <- c("first sentence. blablabla.
       first sentence2. blablablabblah.")

Then by using stringr::str_split,

sapply(str_split(dummy, "\n", simplify = TRUE), function(x) str_split(x, "\\.", simplify = T)[1])

You can get

          first sentence. blablabla.            first sentence2. blablablabblah. 
                    "first sentence"                "           first sentence2" 

If your input is vector of paragraphs,

dummy <- c("first sentence. blablabla.","first sentence2. blablablabblah.")
sapply(dummy, function(x)str_split(x, "\\.", simplify = T)[1])

   first sentence. blablabla. first sentence2. blablablabblah. 
             "first sentence"                "first sentence2" 

Code for your text.

dummy <- c("Now, I truly understand that because it's an election season expectations for what we will achieve this year is really low. But, Mister Speaker, I appreciate the very constructive approach that you and other leaders took at the end of last year to pass a budget and make tax cuts permanent for working families." , "So I hope we can work together this year on some priorities like criminal justice reform.So, who knows, we might surprise the cynics again.")

lapply(dummy, function(x)str_split(x, "\\.", simplify = T)[1])

[[1]]
[1] "Now, I truly understand that because it's an election season expectations for what we will achieve this year is really low"

[[2]]
[1] "So I hope we can work together this year on some priorities like criminal justice reform"

unlist(lapply(dummy, function(x)str_split(x, "\\.", simplify = T)[1]))
[1] "Now, I truly understand that because it's an election season expectations for what we will achieve this year is really low"
[2] "So I hope we can work together this year on some priorities like criminal justice reform" 
  •  Tags:  
  • r
  • Related