Home > Blockchain >  Keep row as soon as cumulative value reaches a certain threshold R
Keep row as soon as cumulative value reaches a certain threshold R

Time:10-17

I have a dataframe where I would like to keep a row as soon as the cumulative value of a column reaches a certain level. The dataset could look like this:

set.seed(0)
n <- 10
dat <- data.frame(id=1:n, 
                  
                  group=rep(LETTERS[1:2], n/2),
                  age=sample(18:30, n, replace=TRUE),
                  type=factor(paste("type", 1:n)),
                  x=abs(rnorm(n)))
dat


   id group age    type           x
1   1     A  26  type 1 0.928567035
2   2     B  21  type 2 0.294720447
3   3     A  24  type 3 0.005767173
4   4     B  18  type 4 2.404653389
5   5     A  19  type 5 0.763593461
6   6     B  30  type 6 0.799009249
7   7     A  24  type 7 1.147657009
8   8     B  28  type 8 0.289461574
9   9     A  19  type 9 0.299215118
10 10     B  28 type 10 0.411510833

Where I want to keep a row as soon as the cumulative value of x reaches a threshold (e.g. 1), starting to count again as soon as a row was retained. Which would result in the following output:

   id group age    type           x
2   2     B  21  type 2 0.294720447
4   4     B  18  type 4 2.404653389
6   6     B  30  type 6 0.799009249
7   7     A  24  type 7 1.147657009
10 10     B  28 type 10 0.411510833

I am trying to get a dplyr based solution but can't seem to figure it out. Any tips?

CodePudding user response:

You can use purrr::accumulate to compute the cumsum with threshold, then use dplyr::slice_tail to get the last value before the cumsum cuts the threshold:

library(dplyr)
library(purrr)
dat %>% 
  group_by(a = cumsum(x == accumulate(x, ~ ifelse(.x <= 1, .x   .y, .y)))) %>% 
  slice_tail(n = 1)

#      id group   age type        x    gp
# 1     2 B        21 type 2  0.295     1
# 2     4 B        18 type 4  2.40      2
# 3     6 B        30 type 6  0.799     3
# 4     7 A        24 type 7  1.15      4
# 5    10 B        28 type 10 0.412     5

Another option is to use MESS::cumsumbinning, which might be more friendly to use:

library(MESS)
library(dplyr)
dat %>% 
  group_by(a = cumsumbinning(x, 1, cutwhenpassed = T)) %>% 
  slice_tail(n = 1)

CodePudding user response:

Mael beat me with the cumsumbinning() from the MESS-package...

Here is a data.table option using that function:

library(MESS)
library(data.table)
setDT(dat)[, .SD[.N], by = MESS::cumsumbinning(x, 1, cutwhenpassed = TRUE)]
#    MESS id group age    type
# 1:    1  2     B  21  type 2
# 2:    2  4     B  18  type 4
# 3:    3  6     B  30  type 6
# 4:    4  7     A  24  type 7
# 5:    5 10     B  28 type 10
  • Related