Home > Enterprise >  Extract all text before \n in R regex
Extract all text before \n in R regex

Time:06-23

I want to extract all text in a string before a "\n" appears.

Test string:

string <- "Stack Overflow\nIs a great website for asking programming questions\nOther Info"

Solution extracts "Stack Overflow"

Bonus point if it grabs the first word of the string and the last word before the "\n" Example:

string2 <- "Stack Overflow Dot Com\nIS a great website for asking programming questions\nOther Info"

Solution extracts "Stack Com"

CodePudding user response:

if you want them as one string:

sub('(\\S )[^\n]* (\\S ).*', '\\1 \\2', string2)
[1] "Stack Com"

if you want them as separate strings:

stringr::str_match(string2, '(\\S ).* (\\S )')[,-1]
[1] "Stack" "Com"  

CodePudding user response:

Seems you want to have a solution with regexp, to answer your first question

/(.*)/

will match the whole string before your first end of line (\n) regexp101 Test

To have the first and the last word matched on a one liner you can try

/([^ ] ).* (.*)$/

Probably someone can improve my answer to filter out this solution to match the first and last word before the first occurrence of newline. regexp101 Test

CodePudding user response:

Here is trick of using double gsub

> s
[1] "Stack Overflow\nIs a great website for asking programming questions\nOther Info"
[2] "Stack Overflow Dot Com\nIS a great website for asking programming questions\nOther Info"

> gsub("\\s.*\\s", " ", gsub("\n.*", "", s))
[1] "Stack Overflow" "Stack Com"
  • Related