Home > Software design >  Extract all four characters from a string using regular expression in R
Extract all four characters from a string using regular expression in R

Time:11-19

I am trying to extract all four characters from a string with a pattern. It is needed to extract all seriate four characters. For example in string

'ghijklm'

the return value is a vector as follows:

'ghij' , 'hijk' , 'ijkl' , 'jklm'

Using regular expression I can extract first four string as follows:

library(stringr)
library(dplyr)
text1<-'ghijklm'
> text1 %>% stringr::str_match('\\w{4}')
      [,1]  
[1,] "ghij"   

But I do not know how to continue?

CodePudding user response:

We may use substring

substring(text1, 1:4, 4:7)

-output

[1] "ghij" "hijk" "ijkl" "jklm"

If we want to use str_match, for multiple cases the suffix _all is needed and also, a regex lookaround will help

library(stringr)
str_match_all(text1, "(?=(\\w{4}))")[[1]][,2]
[1] "ghij" "hijk" "ijkl" "jklm"

Or

str_match_all(text1, "(?<=(\\w{4}))")[[1]][,2]
[1] "ghij" "hijk" "ijkl" "jklm"

CodePudding user response:

Another solution, based on purrr::map:

library(purrr)

z <- "ghijklm"
map(1:(nchar(z)-3), ~ substr(z,.x, .x 3)) %>% unlist

#> [1] "ghij" "hijk" "ijkl" "jklm"
  •  Tags:  
  • r
  • Related