Home > Enterprise >  How to calculate cumulative values within a column
How to calculate cumulative values within a column

Time:03-27

I'm trying to calculate the cumulative time among several grades.

Here's how my original df looks like:

df = data.frame(id = c(1,1,1,1,2,2,2,2),
                group = c(0,0,0,0,1,1,1,1),
                grade = c(0,1,2,3,0,1,3,4),
                time = c(10,7,4,1,20,17,14,11))

Here's what I'm expecting as the result df1:

df1 = df %>%
  pivot_wider(
    names_from = "grade",
    names_prefix = "grade_",
    values_from = "time") %>%
  replace(is.na(.), 0) %>%
  
  mutate(grade_1 = grade_1   grade_2   grade_3   grade_4,
         grade_2 = grade_2   grade_3   grade_4,
         grade_3 = grade_3   grade_4) %>%
  
  pivot_longer(
    cols = 3:7,
    names_to = "grade",
    names_prefix = "grade_",
    values_to = "time") 

My method works, but I want it to be more flexible. When I have more grades in the df, I don't need to manually add grade_x = grade_1 grade_2 grade_3 ...

Thank you!

CodePudding user response:

One option would be to rearrange the grade column (excluding the first row, where grade == 0, then do cumsum so that it is in reverse. Then, we can re-arrange back in the desired order.

library(tidyverse)

results <- df %>%
  group_by(id, grp2 = ifelse(grade == 0, 0, 1)) %>%
  arrange(id, desc(grade)) %>%
  mutate(time = cumsum(time)) %>%
  ungroup(grp2) %>%
  select(-grp2) %>%
  arrange(id, grade) %>%
  ungroup

Output

     id group grade  time
  <dbl> <dbl> <dbl> <dbl>
1     1     0     0    10
2     1     0     1    12
3     1     0     2     5
4     1     0     3     1
5     2     1     0    20
6     2     1     1    42
7     2     1     3    25
8     2     1     4    11

If you need each group to have the same number of rows as in your desired output, then you can use complete:

df %>%
  tidyr::complete(id, grade) %>% 
  group_by(id) %>% 
  fill(group, .direction ="downup") %>% 
  replace(is.na(.), 0) %>% 
  group_by(id, grp2 = ifelse(grade == 0, 0, 1)) %>% 
  arrange(id, desc(grade)) %>%
  mutate(time = cumsum(time)) %>%
  ungroup(grp2) %>%
  select(-grp2) %>%
  arrange(id, grade) %>%
  ungroup

Output

      id grade group  time
   <dbl> <dbl> <dbl> <dbl>
 1     1     0     0    10
 2     1     1     0    12
 3     1     2     0     5
 4     1     3     0     1
 5     1     4     0     0
 6     2     0     1    20
 7     2     1     1    42
 8     2     2     1    25
 9     2     3     1    25
10     2     4     1    11

Or if you want to pivot back and forth then you could do something like this:

output <- df %>%
  pivot_wider(
    names_from = "grade",
    names_prefix = "grade_",
    values_from = "time") %>%
  replace(is.na(.), 0) %>%
  select(id, group, grade_0, last_col():grade_1)

results2 <- output %>% 
  select(-c(id, group, grade_0)) %>% 
  rowwise()%>% 
  do(data.frame(t(cumsum(unlist(.))))) %>% 
  bind_cols(select(output, id, group, grade_0), .) %>% 
  pivot_longer(
    cols = 3:7,
    names_to = "grade",
    names_prefix = "grade_",
    values_to = "time")

CodePudding user response:

1st Try:

for cumulative sums across a variable, we can group_by and use cumsum() : No need to specify grades, etc. You can do more aggregations if needed.

df%>%
  group_by(grade)%>%
  mutate(Cum_Time = cumsum(time))%>%arrange(grade)

     id group grade  time Cum_Time
  <dbl> <dbl> <dbl> <dbl>    <dbl>
1     1     0     0    10       10
2     2     1     0    20       30
3     1     0     1     7        7
4     2     1     1    17       24
5     1     0     2     4        4
6     1     0     3     1        1
7     2     1     3    14       15
8     2     1     4    11       11
  •  Tags:  
  • r
  • Related