Group according to cumulative sums-CodePudding

I want to make groups according to the accumulated sum. in my tiny example, the two first numbers in the column value sums to exactly 100 and is group A. The three next numbers sum to less than 100 and go to group B, while the last number exceeds 100 and gets its own group C - and so forth.

input <- data.frame(id=c(1:6),
                   value =c(99, 1,  33, 33, 33, 150))
input

desired_output <- data.frame(id=c(1:6),
              value =c(99, 1,  33, 33, 33, 150),
              group= c("A", "A", "B", "B", "B", "C"))

desired_output

Thank you

CodePudding user response：

Two possible one-liners, with purrr::accumulate and with MESS::cumsumbinning:

purrr::accumulate

library(tidyverse)
group_by(input, grp = LETTERS[cumsum(value == accumulate(value, ~ ifelse(.x   .y <= 100, .x   .y, .y)))])

MESS::cumsumbinning

library(dplyr)
group_by(input, grp = LETTERS[MESS::cumsumbinning(value, 100)])

output

# A tibble: 6 x 3
# Groups:   grp [3]
     id value grp  
  <int> <dbl> <chr>
1     1    99 A    
2     2     1 A    
3     3    33 B    
4     4    33 B    
5     5    33 B    
6     6   150 C

CodePudding user response：

desired_output <- data.frame(id=c(1:6),
                             value =c(99, 1,  33, 33, 33, 150),
                             group= c("A", "A", "B", "B", "B", "C"))

library(tidyverse)

desired_output %>% 
  group_by(grp_new = cumsum(value - 1) %/% 100) %>% 
  mutate(grp_new = LETTERS[cur_group_id()]) %>% 
  ungroup() 
#> # A tibble: 6 x 4
#>      id value group grp_new
#>   <int> <dbl> <chr> <chr>  
#> 1     1    99 A     A      
#> 2     2     1 A     A      
#> 3     3    33 B     B      
#> 4     4    33 B     B      
#> 5     5    33 B     B      
#> 6     6   150 C     C

^{Created on 2022-04-15 by the reprex package (v2.0.1)}