Home > Back-end >  Split string by vector of numbers
Split string by vector of numbers

Time:11-05

I have strings of unequal length such as:

  x <- c("11333333444A", "3aaa0085hb", "&ffvyß")

I want to break the strings in x into substrings based on the numeric information stored in a vector y:

  y <- c(2, 8, 11, 12)

to obtain the first 2 characters, then the following characters up until position 8 character, then up until position 11 character, and finally up until 12:

 11, 333333, 444, A
 3a, aa0085, hb
 &f, fvyß

I've tried to use str_locate_all, str_which, and others from stringr but couod not figure out a solution.

CodePudding user response:

With split positions given, an easier option is separate where the sep can take index of vectors as splitting delimiter

library(tibble)
library(stringr)
library(tidyr)
tibble(x) %>%
   separate(x, into = str_c('col', seq_along(y)), sep = y)

-output

# A tibble: 3 × 4
  col1  col2   col3  col4 
  <chr> <chr>  <chr> <chr>
1 11    333333 "444" "A"  
2 3a    aa0085 "hb"  ""   
3 &f    fvyß   ""    ""   

Or use base R with read.fwf and specify the widths by taking the difference of position index

read.fwf(textConnection(x), widths = c(y[1], diff(y)))
  V1     V2   V3   V4
1 11 333333  444    A
2 3a aa0085   hb <NA>
3 &f   fvyß <NA> <NA>
  • Related