Home > database >  How to use cumsum to get the cumulative sum of a freq table - R
How to use cumsum to get the cumulative sum of a freq table - R

Time:03-04

I have the following frequency table (code copy below). This frequency table shows the monday of a given week and the number of observations on that given week.

print(helper_table)
          Week  n
 1: 10/04/2021 42
 2: 10/11/2021 18
 3: 10/18/2021 40
 4: 10/25/2021 33
 5: 11/01/2021 29
 6: 11/08/2021 27
 7: 11/15/2021 43
 8: 11/22/2021 41
 9: 11/29/2021 17
10: 12/06/2021 27
11: 12/13/2021 27
12: 12/20/2021 26
13: 12/27/2021 13
14: 01/03/2022 10
15: 01/10/2022 15
16: 01/17/2022 13
17: 01/24/2022 15
18: 01/31/2022 20
19: 02/07/2022 30
20: 02/14/2022 14
21: 02/21/2022 20
22: 02/28/2022  7

But my goal is to have another column that shows the cumulative frequency. I'm trying this code but its just duplicating the n column. At thought that maybe because the n column was a integer that was preventing it from working but I did convert it to numeric.

helper_table$n <- as.numeric(helper_table$n)

helper_table %>%  
  group_by(Week) %>%  
  #arrange(desc(x1)) %>%
  mutate(cumsum=cumsum(n))

And this resulted in the following output:

# A tibble: 22 x 3
# Groups:   Week [22]
   Week           n cumsum
   <chr>      <dbl>  <dbl>
 1 10/04/2021    42     42
 2 10/11/2021    18     18
 3 10/18/2021    40     40
 4 10/25/2021    33     33
 5 11/01/2021    29     29
 6 11/08/2021    27     27
 7 11/15/2021    43     43
 8 11/22/2021    41     41
 9 11/29/2021    17     17
10 12/06/2021    27     27
# ... with 12 more rows

Data:

data.table::data.table(
        Week = c("10/04/2021","10/11/2021","10/18/2021",
                 "10/25/2021","11/01/2021","11/08/2021","11/15/2021",
                 "11/22/2021","11/29/2021","12/06/2021","12/13/2021","12/20/2021",
                 "12/27/2021","01/03/2022","01/10/2022","01/17/2022",
                 "01/24/2022","01/31/2022","02/07/2022","02/14/2022","02/21/2022",
                 "02/28/2022"),
           n = c(42,18,40,33,29,27,43,41,17,27,27,
                 26,13,10,15,13,15,20,30,14,20,7)
)

CodePudding user response:

Perhaps changing:

mutate(cumsum=cumsum(n))

By

mutate(cumsum=cumsum(1:n))

CodePudding user response:

As pointed out by Gregor Thomas and Waldi in the comments, removing the grouping variable achieves the required result:

helper_table %>%  
   mutate(cumsum=cumsum(n))
 #         Week  n cumsum
 #1: 10/04/2021 42     42
 #2: 10/11/2021 18     60
 #3: 10/18/2021 40    100
 #4: 10/25/2021 33    133
 #5: 11/01/2021 29    162
 #...
  • Related