Home > Back-end >  Regex expression to match every nth occurence of a pattern
Regex expression to match every nth occurence of a pattern

Time:03-17

Consider this string,

str = "abc-de-fghi-j-k-lm-n-o-p-qrst-u-vw-x-yz"

I'd like to separate the string at every nth occurrence of a pattern, here -:

f(str, n = 2)
[1] "abc-de" "fghi-j" "k-lm" "n-o"...

f(str, n = 3)
[1] "abc-de-fghi" "j-k-lm" "n-o-p" "qrst-u-vw"...

I know I could do it like this:

spl <- str_split(str, "-", )[[1]]
unname(sapply(split(spl, ceiling(seq(spl) / 2)), paste, collapse = "-"))
[1] "abc-de" "fghi-j" "k-lm"   "n-o"    "p-qrst" "u-vw"   "x-yz" 

But I'm looking for a shorter and cleaner solution

What are the possibilities?

CodePudding user response:

You could use str_extract_all with the pattern \w (?:-\w ){0,2}, for instance to find terms with 3 words and 2 hyphens:

str <- "abc-de-fghi-j-k-lm-n-o-p-qrst-u-vw-x-yz"
n <- 2
regex <- paste0("\\w (?:-\\w ){0,", n, "}")
str_extract_all(str, regex)[[1]]

[1] "abc-de-fghi" "j-k-lm"      "n-o-p"       "qrst-u-vw"   "x-yz"

n <- 3
regex <- paste0("\\w (?:-\\w ){0,", n, "}")
str_extract_all(str, regex)[[1]]

[1] "abc-de-fghi-j" "k-lm-n-o"      "p-qrst-u-vw"   "x-yz"

CodePudding user response:

another approach: First split on every split-pattern found, then paste/collapse into groups of n-length, using the split-pattern-variable as collapse character.

str <- "abc-de-fghi-j-k-lm-n-o-p-qrst-u-vw-x-yz"
n <- 3
pattern <- "-"

ans <- unlist(strsplit(str, pattern))
sapply(split(ans, 
             ceiling(seq_along(ans)/n)), 
       paste0, collapse = pattern)
# "abc-de-fghi"      "j-k-lm"       "n-o-p"   "qrst-u-vw"        "x-yz" 
  • Related