a = tibble(x = runif(1000,0,10),
t = rpois(1000,4)
) %>% arrange(t)
I want a column l
that averages the subset of x
for the values associated to a t < t(x)
.
Expected result:
for x[t=0]
, l = NaN
for x[t=1]
, l = mean(x[t<1])
for x[t=2]
, l = mean(x[t<2])
etc.
A code that does not work:
a %>%
mutate(
l = mean(x[a$t < .$t])
) -> a
Now this could would work:
for (i in c(1:1000)) {
a$l[i] = mean(a$x[a$t < a$t[i]])
}
But is not a mutate
. I'd like a mutate
so I can apply it to groups etc.
To understand better the issue: imagine that you have to average all the x
before a date. Now: this, dynamically, in a mutate.
I think that purrr
may be necessary but I hate it.
CodePudding user response:
You can use map
with mutate
:
library(tidyverse)
f <- function(lim) mean(a$x[a$t < lim])
a %>% mutate(l = map_dbl(t, f))
Testing against OP solution:
res <- a %>% mutate(l = map_dbl(t, f))
l <- vector(mode = "numeric", length = 1000)
for (i in c(1:1000)) l[i] = mean(a$x[a$t < a$t[i]])
assertthat::are_equal(res$l, l) # TRUE
CodePudding user response:
For each t
value you can calculate average value of x
and then calculate lag value of cumulative mean.
library(dplyr)
a %>%
group_by(t) %>%
summarise(l = mean(x)) %>%
mutate(l = lag(cummean(l)))
# t l
# <int> <dbl>
# 1 0 NA
# 2 1 5.33
# 3 2 5.45
# 4 3 5.36
# 5 4 5.26
# 6 5 5.16
# 7 6 5.10
# 8 7 5.07
# 9 8 5.12
#10 9 4.96
#11 10 4.98
#12 11 5.15
#13 12 4.93
If you want to maintain number of rows in the dataframe add %>% left_join(a, by = 't')
to the above answer.
data
set.seed(123)
a = tibble(x = runif(1000,0,10),
t = rpois(1000,4)
) %>% arrange(t)