I have a sequence s
where I expect each proceeding value to be either the same as the previous one or 1.
s = c(1,1,1,1,2,2,2,-2,3,3,4,8,8,8,9,5,5,12,6)
What I want:
1,1,1,1,2,2,2,3,3,4,5,5,6
I've solved this with the following code:
counter = 2
repeat{
if(s[counter] == s[counter-1] | s[counter] == s[counter-1] 1){
counter = counter 1
} else{
s = s[-counter]
}
if(counter >= length(s)) break
}
which however appears quite 'dirty' and inefficient. Is there a computationally less time-consuming solution?
CodePudding user response:
I can't see an easy vector based solution, but a normal for loop with preallocation could help here
s = c(1,1,1,1,2,2,2,-2,3,3,4,8,8,8,9,5,5,12,6)
increasing_seq <- function(x) {
keep <- logical(length(x))
current <- x[1]
for (i in seq_along(x)) {
if (x[i] == current) {
keep[i] <- TRUE
} else if (x[i] == current 1) {
current <- current 1
keep[i] <- TRUE
}
}
x[keep]
}
increasing_seq(s)
# [1] 1 1 1 1 2 2 2 3 3 4 5 5 6
Here we avoid recreating the s
vector of different sizes. It's normally re-allocation that's slow, not looping.
CodePudding user response:
This can be done with Reduce
:
Reduce(function(prev, this)
c(prev, if (any(this %in% (prev[length(prev)] 0:1))) this),
s)
# [1] 1 1 1 1 2 2 2 3 3 4 5 5 6
This can't be vectorized since the calc on one position relies on the results of the previous calculations, but this is compact and (imo) readable.
FYI, if performance is your primary metric, then it appears sindri_baldur's Rcpp and MrFlick's current answers win out by a landslide:
bench::mark(
sindri_baldur = {
keep = vector(length = length(s))
keep[1] = TRUE
last_keep = 1L
for (counter in 2:length(s)) {
if ((s[counter] - s[last_keep]) %in% c(0, 1)) {
last_keep = counter
keep[counter] = TRUE
}
}
s[keep]
},
sindri_baldur_rcpp = s[foo(s)],
r2evans = {
Reduce(function(prev, this)
c(prev, if (any(this %in% (prev[length(prev)] 0:1))) this),
s)
},
MrFlick = increasing_seq(s))
# # A tibble: 4 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 sindri_baldur 2.62ms 3.05ms 317. 37.81KB 15.5 143 7 450.5ms <dbl [13]> <Rprofmem~ <bench_~ <tibble ~
# 2 sindri_baldur_rcpp 1us 1.4us 533120. 2.49KB 0 10000 0 18.8ms <dbl [13]> <Rprofmem~ <bench_~ <tibble ~
# 3 r2evans 28.5us 36.9us 23454. 24.67KB 18.8 9992 8 426ms <dbl [13]> <Rprofmem~ <bench_~ <tibble ~
# 4 MrFlick 2.1us 2.5us 345770. 0B 0 10000 0 28.9ms <dbl [13]> <Rprofmem~ <bench_~ <tibble ~
CodePudding user response:
Edit
R concept translated to Rcpp:
Rcpp::cppFunction('LogicalVector foo(NumericVector s) {
int n = s.size();
Rcpp::LogicalVector keep(n);
keep[0] = 1;
int last = 0;
for (int i = 1; i < n; i ) {
if (s[i] - s[last] == 0) {
keep[i] = 1;
} else if (s[i] - s[last] == 1) {
keep[i] = 1;
last = i;
}
}
return keep;
}')
s[foo(s)]
# [1] 1 1 1 1 2 2 2 3 3 4 5 5 6
Original solution (very similiar but inferior to MrFlick's):
Another slightly more efficient R loop. However, if efficiency is important Rcpp might be a good choice.
keep = vector(length = length(s))
keep[1] = TRUE
last_keep = 1L
for (counter in 2:length(s)) {
if ((s[counter] - s[last_keep]) %in% c(0, 1)) {
last_keep = counter
keep[counter] = TRUE
}
}
s[keep]
# [1] 1 1 1 1 2 2 2 3 3 4 5 5 6