I'm trying to calculate the cumulative time among several grades.
Here's how my original df
looks like:
df = data.frame(id = c(1,1,1,1,2,2,2,2),
group = c(0,0,0,0,1,1,1,1),
grade = c(0,1,2,3,0,1,3,4),
time = c(10,7,4,1,20,17,14,11))
Here's what I'm expecting as the result df1
:
df1 = df %>%
pivot_wider(
names_from = "grade",
names_prefix = "grade_",
values_from = "time") %>%
replace(is.na(.), 0) %>%
mutate(grade_1 = grade_1 grade_2 grade_3 grade_4,
grade_2 = grade_2 grade_3 grade_4,
grade_3 = grade_3 grade_4) %>%
pivot_longer(
cols = 3:7,
names_to = "grade",
names_prefix = "grade_",
values_to = "time")
My method works, but I want it to be more flexible. When I have more grades in the df, I don't need to manually add grade_x = grade_1 grade_2 grade_3 ...
Thank you!
CodePudding user response:
One option would be to rearrange the grade
column (excluding the first row, where grade == 0
, then do cumsum
so that it is in reverse. Then, we can re-arrange back in the desired order.
library(tidyverse)
results <- df %>%
group_by(id, grp2 = ifelse(grade == 0, 0, 1)) %>%
arrange(id, desc(grade)) %>%
mutate(time = cumsum(time)) %>%
ungroup(grp2) %>%
select(-grp2) %>%
arrange(id, grade) %>%
ungroup
Output
id group grade time
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 10
2 1 0 1 12
3 1 0 2 5
4 1 0 3 1
5 2 1 0 20
6 2 1 1 42
7 2 1 3 25
8 2 1 4 11
If you need each group to have the same number of rows as in your desired output, then you can use complete
:
df %>%
tidyr::complete(id, grade) %>%
group_by(id) %>%
fill(group, .direction ="downup") %>%
replace(is.na(.), 0) %>%
group_by(id, grp2 = ifelse(grade == 0, 0, 1)) %>%
arrange(id, desc(grade)) %>%
mutate(time = cumsum(time)) %>%
ungroup(grp2) %>%
select(-grp2) %>%
arrange(id, grade) %>%
ungroup
Output
id grade group time
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 10
2 1 1 0 12
3 1 2 0 5
4 1 3 0 1
5 1 4 0 0
6 2 0 1 20
7 2 1 1 42
8 2 2 1 25
9 2 3 1 25
10 2 4 1 11
Or if you want to pivot back and forth then you could do something like this:
output <- df %>%
pivot_wider(
names_from = "grade",
names_prefix = "grade_",
values_from = "time") %>%
replace(is.na(.), 0) %>%
select(id, group, grade_0, last_col():grade_1)
results2 <- output %>%
select(-c(id, group, grade_0)) %>%
rowwise()%>%
do(data.frame(t(cumsum(unlist(.))))) %>%
bind_cols(select(output, id, group, grade_0), .) %>%
pivot_longer(
cols = 3:7,
names_to = "grade",
names_prefix = "grade_",
values_to = "time")
CodePudding user response:
1st Try:
for cumulative sums across a variable, we can group_by and use cumsum() : No need to specify grades, etc. You can do more aggregations if needed.
df%>%
group_by(grade)%>%
mutate(Cum_Time = cumsum(time))%>%arrange(grade)
id group grade time Cum_Time
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 10 10
2 2 1 0 20 30
3 1 0 1 7 7
4 2 1 1 17 24
5 1 0 2 4 4
6 1 0 3 1 1
7 2 1 3 14 15
8 2 1 4 11 11