I am trying to wrap my head around the idea of recursion. However, when I apply my recursive R function, it does not return a string split into the number of chunks desired. It only returns two chunks. However, my goal is to split a long string into multiple chunks of smaller strings of size n. I am sure there are other ways to do this, but I am trying find a recursive solution. Any help is appreciated thanks in advance.
# Sample dataset
x <- paste0(rep(letters, 10000), collapse = "")
split_group <- function(x, n = 10) {
if (nchar(x) < n) {
return(x)
} else {
beginning <- substring(x, 1, n)
remaining <- substring(x, (n 1), (n 1) (n - 1))
c(beginning, split_group(remaining, n))
}
}
split_group(x = x, n = 10)
# Returns: "abcdefghij" "klmnopqrst" ""
CodePudding user response:
Use <= instead of < and fix remaining.
split_group <- function(x, n = 10) {
if (nchar(x) <= n) x
else {
beginning <- substring(x, 1, n)
remaining <- substring(x, n 1)
c(beginning, split_group(remaining, n))
}
}
x <- substring(paste(letters, collapse = ""), 1, 24)
split_group(x, 2)
## [1] "ab" "cd" "ef" "gh" "ij" "kl" "mn" "op" "qr" "st" "uv" "wx"
split_group(x, 5)
## [1] "abcde" "fghij" "klmno" "pqrst" "uvwx"
split_group(x, 6)
## [1] "abcdef" "ghijkl" "mnopqr" "stuvwx"
split_group(x, 10)
## [1] "abcdefghij" "klmnopqrst" "uvwx"
split_group(x, 23)
## [1] "abcdefghijklmnopqrstuvw" "x"
split_group(x, 24)
## [1] "abcdefghijklmnopqrstuvwx"
split_group(x, 25)
## [1] "abcdefghijklmnopqrstuvwx"
2) and some approaches without recursion The first is the shortest but the second is the simplest and only uses base R. The third only uses base R as well.
library(gsubfn)
strapply(x, "(.{1,10})", simplify = c)
## [1] "abcdefghij" "klmnopqrst" "uvwx"
ix <- seq(1, nchar(x), 10)
substring(x, ix, ix 10 - 1)
## [1] "abcdefghij" "klmnopqrst" "uvwx"
sapply(seq(1, nchar(x), 10), function(i) substring(x, i, i 10 - 1))
## [1] "abcdefghij" "klmnopqrst" "uvwx"
library(zoo)
s <- strsplit(x, "")[[1]]
rollapply(s, 10, by = 10, paste0, collapse = "", partial = TRUE, align = "left")
## [1] "abcdefghij" "klmnopqrst" "uvwx"
CodePudding user response:
A base R
option would be
x1 <- strsplit(x, "(?<=.{10})(?=.)", perl = TRUE)[[1]]
-output
> head(x1, 10)
[1] "abcdefghij" "klmnopqrst" "uvwxyzabcd" "efghijklmn" "opqrstuvwx" "yzabcdefgh" "ijklmnopqr" "stuvwxyzab" "cdefghijkl" "mnopqrstuv"