In R how to get a percentage of a vector row by row [duplicate]-CodePudding

I have a data.table with two columns. I would like to get a rolling percentage row by row that resets by sub.

library(data.table)
sub <-c("A","A","A","A","B","B","B","B","C","C","C","C")
n<-c(0,1,1,1,0,1,0,1,0,1,0,1)

df <- data.table(sub,n )

    sub n
 1:   A 0
 2:   A 1
 3:   A 1
 4:   A 1
 5:   B 0
 6:   B 1
 7:   B 0
 8:   B 1
 9:   C 0
10:   C 1
11:   C 0
12:   C 1

What I need it to look like:

    sub n rolling_percentage
 1:   A 0               0.00
 2:   A 1               0.50
 3:   A 1               0.66
 4:   A 1               0.75
 5:   B 0               0.00
 6:   B 1               0.50
 7:   B 0               0.33
 8:   B 1               0.50
 9:   C 0               0.00
10:   C 1               0.50
11:   C 0               0.33
12:   C 1               0.50

what I have tried : df$rolling_percentage= 100*cumsum(df$n)/sum(df1$n) But that gives me this

    sub n rolling_percentage
 1:   A 0            0.00000
 2:   A 1           14.28571
 3:   A 1           28.57143
 4:   A 1           42.85714
 5:   B 0           42.85714
 6:   B 1           57.14286
 7:   B 0           57.14286
 8:   B 1           71.42857
 9:   C 0           71.42857
10:   C 1           85.71429
11:   C 0           85.71429
12:   C 1          100.00000

I need it to reset every time the df$sub changes.

CodePudding user response：

You can divide the cumulative sum of n with the row number in the group.

library(data.table)
df[, rolling_percentage := cumsum(n)/seq_len(.N), sub]
df

#    sub n rolling_percentage
# 1:   A 0               0.00
# 2:   A 1               0.50
# 3:   A 1               0.67
# 4:   A 1               0.75
# 5:   B 0               0.00
# 6:   B 1               0.50
# 7:   B 0               0.33
# 8:   B 1               0.50
# 9:   C 0               0.00
#10:   C 1               0.50
#11:   C 0               0.33
#12:   C 1               0.50

dplyr has a special function cummean for this purpose.

library(dplyr)
df %>% group_by(sub) %>% mutate(rolling_percentage = cummean(n))