Home > Mobile >  Extract a string that spans across multiple lines - stringr
Extract a string that spans across multiple lines - stringr

Time:03-07

I need to extract a string that spans across multiple lines on an object.

The objetc:

> text <- paste("abc \nd \ne")
> cat(text)
abc 
d 
e

With str_extract_all I can extract all the text between ‘a’ and ‘c’, for example.

> str_extract_all(text, "a.*c")
[[1]]
[1] "abc"

Using the function ‘regex’ and the argument ‘multiline’ set to TRUE, I can extract a string across multiple lines. In this case, I can extract the first character of multiple lines.

> str_extract_all(text, regex("^."))
[[1]]
[1] "a"

> str_extract_all(text, regex("^.", multiline = TRUE))
[[1]]
[1] "a" "d" "e"

But when I try the to extract "every character between a and d" (a regex that spans across multiple lines), the output is "character(0)".

> str_extract_all(text, regex("a.*d", multiline = TRUE))
[[1]]
character(0)

The desired output is:

“abcd”

How to get it with stringr?

CodePudding user response:

str_remove_all(text,"\\s\\n")
[1] "abcde"

OR

paste0(trimws(strsplit(text, "\\n")[[1]]), collapse="")
[1] "abcde"

CodePudding user response:

dplyr:

library(dplyr)
library(stringr)
data.frame(text) %>%
  mutate(new = lapply(str_extract_all(text, "\\w"), paste0, collapse = ""))
         text   new
1 abc \nd \ne abcde

Here we use the character class \\w, which does not include the new line metacharacter \n

base R:

unlist(lapply(str_extract_all(text, "\\w"), paste0, collapse = ""))
[1] "abcde"

CodePudding user response:

We can use gsub:

gsub("[\r\n]|[[:blank:]]", "", text)
[1] "abcde"
  • Related