Leave-one out means by group in R-CodePudding

Imagine a table of individuals over time in different firms. I'm trying to compute for every individual the mean wage of their co-workers (ie the mean wage in their firm at time t excluding them). I have a working code using data.table in R which works, but I'm wondering whether there is a better, more efficient of doing this:

foo <- data.table(
  i = rep(1:6, each = 2), 
  t = rep(1:2, 6),
  f = rep(1:2, each = 6),
  w = 1:12
)

foo[, x := mean(foo[t == .BY$t & f == foo[i == .BY$i & t == .BY$t]$f & i != .BY$i]$w), by = .(i, t)]

CodePudding user response：

We can calculate the LOO mean directly: sum all the wages, subtract the current row wage, divide by the number of rows minus 1.

foo[, loow := (sum(w) - w) / (.N - 1), by = .(f, t)]
#     i t f  w  x loow
#  1: 1 1 1  1  4    4
#  2: 1 2 1  2  5    5
#  3: 2 1 1  3  3    3
#  4: 2 2 1  4  4    4
#  5: 3 1 1  5  2    2
#  6: 3 2 1  6  3    3
#  7: 4 1 2  7 10   10
#  8: 4 2 2  8 11   11
#  9: 5 1 2  9  9    9
# 10: 5 2 2 10 10   10
# 11: 6 1 2 11  8    8
# 12: 6 2 2 12  9    9

CodePudding user response：

I don't know if this is any easier to read, but here's an approach with pmap:

library(dplyr); library(purrr)
foo %>%
   mutate(x = pmap_dbl(cur_data(),~ cur_data() %>%
                                      filter(i != ..1, t == ..2, f == ..3) %>%
                                      pull(w) %>%
                                      mean))
#    i t f  w  x
# 1: 1 1 1  1  4
# 2: 1 2 1  2  5
# 3: 2 1 1  3  3
# 4: 2 2 1  4  4
# 5: 3 1 1  5  2
# 6: 3 2 1  6  3
# 7: 4 1 2  7 10
# 8: 4 2 2  8 11
# 9: 5 1 2  9  9
#10: 5 2 2 10 10
#11: 6 1 2 11  8
#12: 6 2 2 12  9

CodePudding user response：

maybe this:

foo[, V1 := sapply(i, function(x) mean(w[-match(x,i)])) , by=.(f, t)]
#    i t f  w V1
# 1: 1 1 1  1  4
# 2: 1 2 1  2  5
# 3: 2 1 1  3  3
# 4: 2 2 1  4  4
# 5: 3 1 1  5  2
# 6: 3 2 1  6  3
# 7: 4 1 2  7 10
# 8: 4 2 2  8 11
# 9: 5 1 2  9  9
# 10: 5 2 2 10 10
# 11: 6 1 2 11  8
# 12: 6 2 2 12  9