I have a data frame like this:
Team | GF |
---|---|
A | 3 |
B | 5 |
A | 2 |
A | 3 |
B | 1 |
B | 6 |
Looking for output like this (just an additional column):
Team | x | avg(X) |
---|---|---|
A | 3 | 0 |
B | 5 | 0 |
A | 2 | 3 |
A | 3 | 2.5 |
B | 1 | 5 |
B | 6 | 3 |
avg(x) is the average of all previous instances of x where Team is the same. I have the following R code which gets the overall average, however I'm looking for the "step-wise" average.
new_df <- df %>% group_by(Team) %>% summarise(avg_x = mean(x))
Is there a way to vectorize this while only evaluating the previous rows on each "iteration"?
CodePudding user response:
You want the cummean()
function from dplyr
, combined with lag()
:
df %>% group_by(Team) %>% mutate(avg_x = replace_na(lag(cummean(x)), 0))
Producing the following:
# A tibble: 6 × 3
# Groups: Team [2]
Team x avg_x
<chr> <dbl> <dbl>
1 A 3 0
2 B 5 0
3 A 2 3
4 A 3 2.5
5 B 1 5
6 B 6 3
As required.
Edit 1:
As @Ritchie Sacramento pointed out, the following is cleaner and clearer:
df %>% group_by(Team) %>% mutate(avg_x = lag(cummean(x), default = 0))