Home > Software engineering >  Mutate a dynamic subset of variable
Mutate a dynamic subset of variable

Time:12-12

a = tibble(x = runif(1000,0,10),
           t = rpois(1000,4)
) %>% arrange(t)

I want a column l that averages the subset of x for the values associated to a t < t(x).

Expected result:

for x[t=0], l = NaN

for x[t=1], l = mean(x[t<1])

for x[t=2], l = mean(x[t<2])

etc.

A code that does not work:

a %>%
  mutate(
  l = mean(x[a$t < .$t])
  ) -> a

Now this could would work:

for (i in c(1:1000)) {
  a$l[i] = mean(a$x[a$t < a$t[i]])
}

But is not a mutate. I'd like a mutate so I can apply it to groups etc.

To understand better the issue: imagine that you have to average all the x before a date. Now: this, dynamically, in a mutate.

I think that purrr may be necessary but I hate it.

CodePudding user response:

You can use map with mutate:

library(tidyverse)

f <- function(lim) mean(a$x[a$t < lim])

a %>% mutate(l = map_dbl(t, f))

Testing against OP solution:

res <- a %>% mutate(l = map_dbl(t, f))
       
l <- vector(mode = "numeric", length = 1000)     
for (i in c(1:1000)) l[i] = mean(a$x[a$t < a$t[i]])

assertthat::are_equal(res$l, l) # TRUE

CodePudding user response:

For each t value you can calculate average value of x and then calculate lag value of cumulative mean.

library(dplyr)

a %>%
  group_by(t) %>%
  summarise(l = mean(x)) %>%
  mutate(l = lag(cummean(l)))

#       t     l
#   <int> <dbl>
# 1     0 NA   
# 2     1  5.33
# 3     2  5.45
# 4     3  5.36
# 5     4  5.26
# 6     5  5.16
# 7     6  5.10
# 8     7  5.07
# 9     8  5.12
#10     9  4.96
#11    10  4.98
#12    11  5.15
#13    12  4.93

If you want to maintain number of rows in the dataframe add %>% left_join(a, by = 't') to the above answer.

data

set.seed(123)
a = tibble(x = runif(1000,0,10),
           t = rpois(1000,4)
) %>% arrange(t)
  • Related