Home > Blockchain >  How can I remove or subset certain context in a text?
How can I remove or subset certain context in a text?

Time:09-25

Here is my data

data<- "line1\nline2\n\n\n\n\n         VICTIMS OF GUN VIOLENCE TO HOLD GUN TRAFFICKERS LIABLE\n\n  line3"

I want the text between the five consecutive "\n" and the two consecutive "\n" :

"VICTIMS OF GUN VIOLENCE TO HOLD GUN TRAFFICKERS LIABLE"

I tried

text-<str_split(data,"\n") 
str_subset(text,".*\n{5}\\s*(.*)\\s*\n{2}.*")

I get: Warning message: In stri_subset_regex(string, pattern, omit_na = TRUE, negate = negate, : argument is not an atomic vector; coercing

CodePudding user response:

A base R option using sub to capture the text between 5 '\n' and 2 '\n'.

sub('.*\n{5}\\s*(.*)\\s*\n{2}.*', '\\1', data)
#[1] "VICTIMS OF GUN VIOLENCE TO HOLD GUN TRAFFICKERS LIABLE"
  • Related