I have a data.table with two columns. I would like to get a rolling percentage row by row that resets by sub.
library(data.table)
sub <-c("A","A","A","A","B","B","B","B","C","C","C","C")
n<-c(0,1,1,1,0,1,0,1,0,1,0,1)
df <- data.table(sub,n )
sub n
1: A 0
2: A 1
3: A 1
4: A 1
5: B 0
6: B 1
7: B 0
8: B 1
9: C 0
10: C 1
11: C 0
12: C 1
What I need it to look like:
sub n rolling_percentage
1: A 0 0.00
2: A 1 0.50
3: A 1 0.66
4: A 1 0.75
5: B 0 0.00
6: B 1 0.50
7: B 0 0.33
8: B 1 0.50
9: C 0 0.00
10: C 1 0.50
11: C 0 0.33
12: C 1 0.50
what I have tried :
df$rolling_percentage= 100*cumsum(df$n)/sum(df1$n)
But that gives me this
sub n rolling_percentage
1: A 0 0.00000
2: A 1 14.28571
3: A 1 28.57143
4: A 1 42.85714
5: B 0 42.85714
6: B 1 57.14286
7: B 0 57.14286
8: B 1 71.42857
9: C 0 71.42857
10: C 1 85.71429
11: C 0 85.71429
12: C 1 100.00000
I need it to reset every time the df$sub changes.
CodePudding user response:
You can divide the cumulative sum of n
with the row number in the group.
library(data.table)
df[, rolling_percentage := cumsum(n)/seq_len(.N), sub]
df
# sub n rolling_percentage
# 1: A 0 0.00
# 2: A 1 0.50
# 3: A 1 0.67
# 4: A 1 0.75
# 5: B 0 0.00
# 6: B 1 0.50
# 7: B 0 0.33
# 8: B 1 0.50
# 9: C 0 0.00
#10: C 1 0.50
#11: C 0 0.33
#12: C 1 0.50
dplyr
has a special function cummean
for this purpose.
library(dplyr)
df %>% group_by(sub) %>% mutate(rolling_percentage = cummean(n))