I would like to insert a list of sub strings (word_list
) into a string (text
) at specific positions (idx_list
)
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
idx_list = c(5,16,30,50)
word_list = c("AAA", "BBB", "CCC", "DDD")
I know there are multiple possibilites functions (gsub, stri_sub etc.) which I can use in a loop. This gets however quite slow on large corpora. Is there a more efficient solution? Maybe vectorized?
CodePudding user response:
I think it's important to start from the last (highest idx_list
) first, since otherwise all numbers will need to be shifted. (This is certainly not hard, but going backwards seems easier.)
# 0 1 2 3 4 5 6 7 8 9 a b c
# 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
idx_list = c(5,16,30,50)
word_list = c("AAA", "BBB", "CCC", "DDD")
The work:
ord <- order(-idx_list)
Reduce(function(S, R) {
paste0(substring(S, 1, R[[1]]-1), R[[2]], substring(S, R[[1]], nchar(S)))
}, Map(list, idx_list[ord], word_list[ord]), text)
# [1] "LoreAAAm ipsum dolBBBor sit amet, cCCConsectetur adipiscinDDDg elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
Walk-through:
ord
is just the decreasing order, so for instanceword_list[ord] # [1] "DDD" "CCC" "BBB" "AAA"
because we're going to use
Reduce
(explanation in a second), we need the combination ofidx_list[1]
andword_list[1]
to be in one argument, not individual; for this, we combine them usingMap(list, ...)
, which "zips" them together into a single list, each containing the character position and the string to insert:str( Map(list, idx_list[ord], word_list[ord]) ) # List of 4 # $ :List of 2 # ..$ : num 50 # ..$ : chr "DDD" # $ :List of 2 # ..$ : num 30 # ..$ : chr "CCC" # $ :List of 2 # ..$ : num 16 # ..$ : chr "BBB" # $ :List of 2 # ..$ : num 5 # ..$ : chr "AAA"
(This can be used with an arbitrary number of arguments.)
Because we need to insert a string, then insert another string into the result of the first, the base function
Reduce
will work well here. The first arg is a function that accepts two arguments: the results from the previous call, and the next element from theMap
'd argument.
CodePudding user response:
An adaptation of Zach Foster's answer, condensed into a single function
reposition_str <- function(string, inject, index) {
inject <- inject[order(index)]
index <- sort(index)
# expand string
split <- substr(rep(string, length(index) 1),
start = c(1, index),
stop = c(index - 1, nchar(string))
)
ord1 <- 2 * (1:length(split)) - 1
ord2 <- 2 * (1:length(inject))
paste(c(split, inject)[order(c(ord1, ord2))], collapse = "")
}
text <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
idx_list <- c(5, 16, 30, 50)
word_list <- c("AAA", "BBB", "CCC", "DDD")
reposition_str(text, word_list, idx_list)
#> [1] "LoreAAAm ipsum dolBBBor sit amet, cCCConsectetur adipiscinDDDg elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
And a benchmark for speed comparisons:
bench::mark(reposiion = {reposition_str(text, word_list, idx_list)})
# A tibble: 1 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 reposiion 69.5us 74.5us 12303. 79.7KB 8.32 5918 4 481ms <chr [1]> <Rprofmem [218 x 3]> <bench_tm [5,922]> <tibble [5,922 x~