Home > Mobile >  Stepwise column sum in data frame based on another column in R
Stepwise column sum in data frame based on another column in R

Time:04-28

I have a data frame like this:

Team GF
A 3
B 5
A 2
A 3
B 1
B 6

Looking for output like this (just an additional column):

Team x avg(X)
A 3 0
B 5 0
A 2 3
A 3 2.5
B 1 5
B 6 3

avg(x) is the average of all previous instances of x where Team is the same. I have the following R code which gets the overall average, however I'm looking for the "step-wise" average.

    new_df <- df %>% group_by(Team) %>% summarise(avg_x = mean(x))

Is there a way to vectorize this while only evaluating the previous rows on each "iteration"?

CodePudding user response:

You want the cummean() function from dplyr, combined with lag():

df %>% group_by(Team) %>% mutate(avg_x = replace_na(lag(cummean(x)), 0))

Producing the following:

# A tibble: 6 × 3
# Groups:   Team [2]
  Team      x avg_x
  <chr> <dbl> <dbl>
1 A         3   0
2 B         5   0
3 A         2   3
4 A         3   2.5
5 B         1   5
6 B         6   3

As required.

Edit 1:

As @Ritchie Sacramento pointed out, the following is cleaner and clearer:

df %>% group_by(Team) %>% mutate(avg_x = lag(cummean(x), default = 0))
  • Related