Consider this string,
str = "abc-de-fghi-j-k-lm-n-o-p-qrst-u-vw-x-yz"
I'd like to separate the string at every nth occurrence of a pattern, here -
:
f(str, n = 2)
[1] "abc-de" "fghi-j" "k-lm" "n-o"...
f(str, n = 3)
[1] "abc-de-fghi" "j-k-lm" "n-o-p" "qrst-u-vw"...
I know I could do it like this:
spl <- str_split(str, "-", )[[1]]
unname(sapply(split(spl, ceiling(seq(spl) / 2)), paste, collapse = "-"))
[1] "abc-de" "fghi-j" "k-lm" "n-o" "p-qrst" "u-vw" "x-yz"
But I'm looking for a shorter and cleaner solution
What are the possibilities?
CodePudding user response:
You could use str_extract_all
with the pattern \w (?:-\w ){0,2}
, for instance to find terms with 3 words and 2 hyphens:
str <- "abc-de-fghi-j-k-lm-n-o-p-qrst-u-vw-x-yz"
n <- 2
regex <- paste0("\\w (?:-\\w ){0,", n, "}")
str_extract_all(str, regex)[[1]]
[1] "abc-de-fghi" "j-k-lm" "n-o-p" "qrst-u-vw" "x-yz"
n <- 3
regex <- paste0("\\w (?:-\\w ){0,", n, "}")
str_extract_all(str, regex)[[1]]
[1] "abc-de-fghi-j" "k-lm-n-o" "p-qrst-u-vw" "x-yz"
CodePudding user response:
another approach: First split on every split-pattern found, then paste/collapse into groups of n-length, using the split-pattern-variable as collapse character.
str <- "abc-de-fghi-j-k-lm-n-o-p-qrst-u-vw-x-yz"
n <- 3
pattern <- "-"
ans <- unlist(strsplit(str, pattern))
sapply(split(ans,
ceiling(seq_along(ans)/n)),
paste0, collapse = pattern)
# "abc-de-fghi" "j-k-lm" "n-o-p" "qrst-u-vw" "x-yz"