I need to extract a string that spans across multiple lines on an object.
The objetc:
> text <- paste("abc \nd \ne")
> cat(text)
abc
d
e
With str_extract_all I can extract all the text between ‘a’ and ‘c’, for example.
> str_extract_all(text, "a.*c")
[[1]]
[1] "abc"
Using the function ‘regex’ and the argument ‘multiline’ set to TRUE, I can extract a string across multiple lines. In this case, I can extract the first character of multiple lines.
> str_extract_all(text, regex("^."))
[[1]]
[1] "a"
> str_extract_all(text, regex("^.", multiline = TRUE))
[[1]]
[1] "a" "d" "e"
But when I try the to extract "every character between a and d" (a regex that spans across multiple lines), the output is "character(0)".
> str_extract_all(text, regex("a.*d", multiline = TRUE))
[[1]]
character(0)
The desired output is:
“abcd”
How to get it with stringr?
CodePudding user response:
str_remove_all(text,"\\s\\n")
[1] "abcde"
OR
paste0(trimws(strsplit(text, "\\n")[[1]]), collapse="")
[1] "abcde"
CodePudding user response:
dplyr
:
library(dplyr)
library(stringr)
data.frame(text) %>%
mutate(new = lapply(str_extract_all(text, "\\w"), paste0, collapse = ""))
text new
1 abc \nd \ne abcde
Here we use the character class \\w
, which does not include the new line metacharacter \n
base R
:
unlist(lapply(str_extract_all(text, "\\w"), paste0, collapse = ""))
[1] "abcde"
CodePudding user response:
We can use gsub
:
gsub("[\r\n]|[[:blank:]]", "", text)
[1] "abcde"