Home > Mobile >  Conditional cumulative sum and grouping in R
Conditional cumulative sum and grouping in R

Time:01-02

I am trying to create a group variable based on cumulative sum of another variable. I want to apply a constraint on the cumulative sum if it goes beyond a limit (15000000) then group variable should change. Here is the code that I am working on:-

myDat = data.frame(Seg = c("A","B","C","D","F","G","H"),
                       Freq =c(4558848, 10926592, 15783936,8266496,7729349,13234562,9873456))

myDat$csum <- ceiling(ave(myDat$Freq,FUN=cumsum)/15000000)

# Seg     Freq csum
# A  4558848    1
# B 10926592    2
# C 15783936    3
# D  8266496    3
# F  7729349    4
# G 13234562    5
# H  9873456    5

myDat1 <- aggregate(Freq~csum, data=myDat, FUN = sum)

# csum     Freq
# 1  4558848
# 2 10926592
# 3 24050432
# 4  7729349
# 5 23108018

Some of the groups have gone beyond 15000000 limit. Can anyone help me with this code?

# Desired Results:-

# Seg     Freq csum  Desired csum
# A  4558848    1    1  
# B 10926592    2    2
# C 15783936    3    3
# D  8266496    3    4
# F  6229349    4    4
# G 13234562    4    5
# H  9873456    5    6

CodePudding user response:

I believe you want cumsum(Freq > 1e7).

with(myDat, aggregate(list(Freq=Freq), list(csum=cumsum(Freq > 1e7)   1), sum))
#   csum     Freq
# 1    1  4558848
# 2    2 10926592
# 3    3 31779781
# 4    4 23108018

transform(myDat, csum=cumsum(Freq > 1e7)   1)
#   Seg     Freq csum
# 1   A  4558848    1
# 2   B 10926592    2
# 3   C 15783936    3
# 4   D  8266496    3
# 5   F  7729349    3
# 6   G 13234562    4
# 7   H  9873456    4

Data:

myDat <- structure(list(Seg = c("A", "B", "C", "D", "F", "G", "H"), Freq = c(4558848, 
10926592, 15783936, 8266496, 7729349, 13234562, 9873456)), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

I am able to find an answer to it, credit to the link .

myDat %>% mutate(cumsum_15 = accumulate(Freq, ~ifelse(.x   .y <= 15000000, .x   .y, .y)),
                 group_15 = cumsum(Freq == cumsum_10))
  • Related