Home > OS >  How to cumsum the elements of a vector under certain condition in R?
How to cumsum the elements of a vector under certain condition in R?

Time:02-11

My objective is to do a cumulative sum of the elements of a vector and assign the result to each element. But when certain condition is reached, then reset the cumulative sum.

For example:

vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)

Now, suppose that the condition to reset the cumulative sum is that the next element has a different sign.

Then the desired output is:

vector_B <- c(1, 2, -1, -2, -3, 1, -1, -2, 1, -1)

How can I achieve this?

CodePudding user response:

A base R option with Reduce

> Reduce(function(x, y) ifelse(x * y > 0, x   y, y), vector_A, accumulate = TRUE)
 [1]  1  2 -1 -2 -3  1 -1 -2  1 -1

or using ave cumsum

> ave(vector_A, cumsum(c(1, diff(sign(vector_A)) != 0)), FUN = cumsum)
 [1]  1  2 -1 -2 -3  1 -1 -2  1 -1

CodePudding user response:

Using ave:

ave(vector_A, data.table::rleid(sign(A)), FUN = cumsum)
#  [1]  1  2 -1 -2 -3  1 -1 -2  1 -1

A formula version of accumulate:

purrr::accumulate(vector_A, ~ ifelse(sign(.x) == sign(.y), .x   .y, .y))
#  [1]  1  2 -1 -2 -3  1 -1 -2  1 -1

CodePudding user response:

You can use a custom function instead of cumsum and accumulate results using e.g. purrr::accumulate:

library(purrr)
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)

purrr::accumulate(vector_A, function(a,b) {
  if (sign(a) == sign(b))
    a b
  else
    b
  })

[1]  1  2 -1 -2 -3  1 -1 -2  1 -1

or if you want to avoid any branch:

purrr::accumulate(vector_A, function(a,b) { b   a*(sign(a) == sign(b))})

[1]  1  2 -1 -2 -3  1 -1 -2  1 -1

CodePudding user response:

The approach that comes to mind is to find the runs (rle()) defined by the condition (sign()) in the data, apply cumsum() on each run separately (tapply()), and the concatenate back into a vector (unlist()). Something like this:

vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)

run_length <- rle(sign(vector_A))$lengths
run_id <- rep(seq_along(run_length), run_length)

unlist(tapply(vector_A, run_id, cumsum), use.names = FALSE)
#>  [1]  1  2 -1 -2 -3  1 -1 -2  1 -1

Wrapping the process up a bit, I’d maybe put finding the grouping factor (run index) in a function? And then the grouped summary will need to be done using existing tools, like tapply() above, or a creative ave(), or in the context of data frames, a group_by() and summarise() with dplyr.

run_index <- function(x) {
  with(rle(x), rep(seq_along(lengths), lengths))
}

ave(vector_A, run_index(sign(vector_A)), FUN = cumsum)
#>  [1]  1  2 -1 -2 -3  1 -1 -2  1 -1
  • Related