Home > Mobile >  Use Recursion in R to Split a String Into Chunks
Use Recursion in R to Split a String Into Chunks

Time:12-18

I am trying to wrap my head around the idea of recursion. However, when I apply my recursive R function, it does not return a string split into the number of chunks desired. It only returns two chunks. However, my goal is to split a long string into multiple chunks of smaller strings of size n. I am sure there are other ways to do this, but I am trying find a recursive solution. Any help is appreciated thanks in advance.

# Sample dataset
x <- paste0(rep(letters, 10000), collapse = "")

split_group <- function(x, n = 10) {
    if (nchar(x) < n) {
        return(x)
    } else {
        beginning <- substring(x, 1, n)
        remaining <- substring(x, (n   1), (n   1)   (n - 1))
      c(beginning, split_group(remaining, n))
    }
}

split_group(x = x, n = 10)

# Returns:  "abcdefghij" "klmnopqrst" ""  

CodePudding user response:

Use <= instead of < and fix remaining.

split_group <- function(x, n = 10) {
    if (nchar(x) <= n) x
    else {
        beginning <- substring(x, 1, n)
        remaining <- substring(x, n   1)
        c(beginning, split_group(remaining, n))
    }
}

x <- substring(paste(letters, collapse = ""), 1, 24)

split_group(x, 2)
##  [1] "ab" "cd" "ef" "gh" "ij" "kl" "mn" "op" "qr" "st" "uv" "wx"

split_group(x, 5)
## [1] "abcde" "fghij" "klmno" "pqrst" "uvwx" 

split_group(x, 6)
## [1] "abcdef" "ghijkl" "mnopqr" "stuvwx"

split_group(x, 10)
## [1] "abcdefghij" "klmnopqrst" "uvwx"      

split_group(x, 23)
## [1] "abcdefghijklmnopqrstuvw" "x"                      

split_group(x, 24)
## [1] "abcdefghijklmnopqrstuvwx"

split_group(x, 25)
## [1] "abcdefghijklmnopqrstuvwx"

2) and some approaches without recursion The first is the shortest but the second is the simplest and only uses base R. The third only uses base R as well.

library(gsubfn)
strapply(x, "(.{1,10})", simplify = c)
## [1] "abcdefghij" "klmnopqrst" "uvwx"      

ix <- seq(1, nchar(x), 10)
substring(x, ix, ix   10 - 1)
## [1] "abcdefghij" "klmnopqrst" "uvwx"      

sapply(seq(1, nchar(x), 10), function(i) substring(x, i, i   10 - 1))
## [1] "abcdefghij" "klmnopqrst" "uvwx"    

library(zoo)
s <- strsplit(x, "")[[1]]
rollapply(s, 10, by = 10, paste0, collapse = "", partial = TRUE, align = "left")
## [1] "abcdefghij" "klmnopqrst" "uvwx"      

CodePudding user response:

A base R option would be

x1 <- strsplit(x, "(?<=.{10})(?=.)", perl = TRUE)[[1]]

-output

> head(x1, 10)
 [1] "abcdefghij" "klmnopqrst" "uvwxyzabcd" "efghijklmn" "opqrstuvwx" "yzabcdefgh" "ijklmnopqr" "stuvwxyzab" "cdefghijkl" "mnopqrstuv"
  • Related