Extract all text before \n in R regex-CodePudding

I want to extract all text in a string before a "\n" appears.

Test string:

string <- "Stack Overflow\nIs a great website for asking programming questions\nOther Info"

Solution extracts "Stack Overflow"

Bonus point if it grabs the first word of the string and the last word before the "\n" Example:

string2 <- "Stack Overflow Dot Com\nIS a great website for asking programming questions\nOther Info"

Solution extracts "Stack Com"

CodePudding user response：

if you want them as one string:

sub('(\\S )[^\n]* (\\S ).*', '\\1 \\2', string2)
[1] "Stack Com"

if you want them as separate strings:

stringr::str_match(string2, '(\\S ).* (\\S )')[,-1]
[1] "Stack" "Com"

CodePudding user response：

Seems you want to have a solution with regexp, to answer your first question

/(.*)/

will match the whole string before your first end of line (\n) regexp101 Test

To have the first and the last word matched on a one liner you can try

/([^ ] ).* (.*)$/

Probably someone can improve my answer to filter out this solution to match the first and last word before the first occurrence of newline. regexp101 Test

CodePudding user response：

Here is trick of using double gsub

> s
[1] "Stack Overflow\nIs a great website for asking programming questions\nOther Info"
[2] "Stack Overflow Dot Com\nIS a great website for asking programming questions\nOther Info"

> gsub("\\s.*\\s", " ", gsub("\n.*", "", s))
[1] "Stack Overflow" "Stack Com"