I want to make groups according to the accumulated sum. in my tiny example, the two first numbers in the column value sums to exactly 100 and is group A. The three next numbers sum to less than 100 and go to group B, while the last number exceeds 100 and gets its own group C - and so forth.
input <- data.frame(id=c(1:6),
value =c(99, 1, 33, 33, 33, 150))
input
desired_output <- data.frame(id=c(1:6),
value =c(99, 1, 33, 33, 33, 150),
group= c("A", "A", "B", "B", "B", "C"))
desired_output
Thank you
CodePudding user response:
Two possible one-liners, with purrr::accumulate
and with MESS::cumsumbinning
:
purrr::accumulate
library(tidyverse)
group_by(input, grp = LETTERS[cumsum(value == accumulate(value, ~ ifelse(.x .y <= 100, .x .y, .y)))])
MESS::cumsumbinning
library(dplyr)
group_by(input, grp = LETTERS[MESS::cumsumbinning(value, 100)])
output
# A tibble: 6 x 3
# Groups: grp [3]
id value grp
<int> <dbl> <chr>
1 1 99 A
2 2 1 A
3 3 33 B
4 4 33 B
5 5 33 B
6 6 150 C
CodePudding user response:
desired_output <- data.frame(id=c(1:6),
value =c(99, 1, 33, 33, 33, 150),
group= c("A", "A", "B", "B", "B", "C"))
library(tidyverse)
desired_output %>%
group_by(grp_new = cumsum(value - 1) %/% 100) %>%
mutate(grp_new = LETTERS[cur_group_id()]) %>%
ungroup()
#> # A tibble: 6 x 4
#> id value group grp_new
#> <int> <dbl> <chr> <chr>
#> 1 1 99 A A
#> 2 2 1 A A
#> 3 3 33 B B
#> 4 4 33 B B
#> 5 5 33 B B
#> 6 6 150 C C
Created on 2022-04-15 by the reprex package (v2.0.1)