I am trying to extract all four characters from a string with a pattern. It is needed to extract all seriate four characters. For example in string
'ghijklm'
the return value is a vector as follows:
'ghij' , 'hijk' , 'ijkl' , 'jklm'
Using regular expression I can extract first four string as follows:
library(stringr)
library(dplyr)
text1<-'ghijklm'
> text1 %>% stringr::str_match('\\w{4}')
[,1]
[1,] "ghij"
But I do not know how to continue?
CodePudding user response:
We may use substring
substring(text1, 1:4, 4:7)
-output
[1] "ghij" "hijk" "ijkl" "jklm"
If we want to use str_match
, for multiple cases the suffix _all
is needed and also, a regex lookaround will help
library(stringr)
str_match_all(text1, "(?=(\\w{4}))")[[1]][,2]
[1] "ghij" "hijk" "ijkl" "jklm"
Or
str_match_all(text1, "(?<=(\\w{4}))")[[1]][,2]
[1] "ghij" "hijk" "ijkl" "jklm"
CodePudding user response:
Another solution, based on purrr::map
:
library(purrr)
z <- "ghijklm"
map(1:(nchar(z)-3), ~ substr(z,.x, .x 3)) %>% unlist
#> [1] "ghij" "hijk" "ijkl" "jklm"