I have strings of unequal length such as:
x <- c("11333333444A", "3aaa0085hb", "&ffvyß")
I want to break the strings in x
into substrings based on the numeric information stored in a vector y
:
y <- c(2, 8, 11, 12)
to obtain the first 2
characters, then the following characters up until position 8
character, then up until position 11
character, and finally up until 12
:
11, 333333, 444, A
3a, aa0085, hb
&f, fvyß
I've tried to use str_locate_all
, str_which
, and others from stringr
but couod not figure out a solution.
CodePudding user response:
With split positions given, an easier option is separate
where the sep
can take index of vectors as splitting delimiter
library(tibble)
library(stringr)
library(tidyr)
tibble(x) %>%
separate(x, into = str_c('col', seq_along(y)), sep = y)
-output
# A tibble: 3 × 4
col1 col2 col3 col4
<chr> <chr> <chr> <chr>
1 11 333333 "444" "A"
2 3a aa0085 "hb" ""
3 &f fvyß "" ""
Or use base R
with read.fwf
and specify the widths
by taking the diff
erence of position index
read.fwf(textConnection(x), widths = c(y[1], diff(y)))
V1 V2 V3 V4
1 11 333333 444 A
2 3a aa0085 hb <NA>
3 &f fvyß <NA> <NA>