I have a dataframe where I would like to keep a row as soon as the cumulative value of a column reaches a certain level. The dataset could look like this:
set.seed(0)
n <- 10
dat <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
age=sample(18:30, n, replace=TRUE),
type=factor(paste("type", 1:n)),
x=abs(rnorm(n)))
dat
id group age type x
1 1 A 26 type 1 0.928567035
2 2 B 21 type 2 0.294720447
3 3 A 24 type 3 0.005767173
4 4 B 18 type 4 2.404653389
5 5 A 19 type 5 0.763593461
6 6 B 30 type 6 0.799009249
7 7 A 24 type 7 1.147657009
8 8 B 28 type 8 0.289461574
9 9 A 19 type 9 0.299215118
10 10 B 28 type 10 0.411510833
Where I want to keep a row as soon as the cumulative value of x reaches a threshold (e.g. 1), starting to count again as soon as a row was retained. Which would result in the following output:
id group age type x
2 2 B 21 type 2 0.294720447
4 4 B 18 type 4 2.404653389
6 6 B 30 type 6 0.799009249
7 7 A 24 type 7 1.147657009
10 10 B 28 type 10 0.411510833
I am trying to get a dplyr based solution but can't seem to figure it out. Any tips?
CodePudding user response:
You can use purrr::accumulate
to compute the cumsum with threshold, then use dplyr::slice_tail
to get the last value before the cumsum cuts the threshold:
library(dplyr)
library(purrr)
dat %>%
group_by(a = cumsum(x == accumulate(x, ~ ifelse(.x <= 1, .x .y, .y)))) %>%
slice_tail(n = 1)
# id group age type x gp
# 1 2 B 21 type 2 0.295 1
# 2 4 B 18 type 4 2.40 2
# 3 6 B 30 type 6 0.799 3
# 4 7 A 24 type 7 1.15 4
# 5 10 B 28 type 10 0.412 5
Another option is to use MESS::cumsumbinning
, which might be more friendly to use:
library(MESS)
library(dplyr)
dat %>%
group_by(a = cumsumbinning(x, 1, cutwhenpassed = T)) %>%
slice_tail(n = 1)
CodePudding user response:
Mael beat me with the cumsumbinning()
from the MESS
-package...
Here is a data.table
option using that function:
library(MESS)
library(data.table)
setDT(dat)[, .SD[.N], by = MESS::cumsumbinning(x, 1, cutwhenpassed = TRUE)]
# MESS id group age type
# 1: 1 2 B 21 type 2
# 2: 2 4 B 18 type 4
# 3: 3 6 B 30 type 6
# 4: 4 7 A 24 type 7
# 5: 5 10 B 28 type 10