Imagine a table of individuals over time in different firms. I'm trying to compute for every individual the mean wage of their co-workers (ie the mean wage in their firm at time t excluding them). I have a working code using data.table in R which works, but I'm wondering whether there is a better, more efficient of doing this:
foo <- data.table(
i = rep(1:6, each = 2),
t = rep(1:2, 6),
f = rep(1:2, each = 6),
w = 1:12
)
foo[, x := mean(foo[t == .BY$t & f == foo[i == .BY$i & t == .BY$t]$f & i != .BY$i]$w), by = .(i, t)]
CodePudding user response:
We can calculate the LOO mean directly: sum all the wages, subtract the current row wage, divide by the number of rows minus 1.
foo[, loow := (sum(w) - w) / (.N - 1), by = .(f, t)]
# i t f w x loow
# 1: 1 1 1 1 4 4
# 2: 1 2 1 2 5 5
# 3: 2 1 1 3 3 3
# 4: 2 2 1 4 4 4
# 5: 3 1 1 5 2 2
# 6: 3 2 1 6 3 3
# 7: 4 1 2 7 10 10
# 8: 4 2 2 8 11 11
# 9: 5 1 2 9 9 9
# 10: 5 2 2 10 10 10
# 11: 6 1 2 11 8 8
# 12: 6 2 2 12 9 9
CodePudding user response:
I don't know if this is any easier to read, but here's an approach with pmap
:
library(dplyr); library(purrr)
foo %>%
mutate(x = pmap_dbl(cur_data(),~ cur_data() %>%
filter(i != ..1, t == ..2, f == ..3) %>%
pull(w) %>%
mean))
# i t f w x
# 1: 1 1 1 1 4
# 2: 1 2 1 2 5
# 3: 2 1 1 3 3
# 4: 2 2 1 4 4
# 5: 3 1 1 5 2
# 6: 3 2 1 6 3
# 7: 4 1 2 7 10
# 8: 4 2 2 8 11
# 9: 5 1 2 9 9
#10: 5 2 2 10 10
#11: 6 1 2 11 8
#12: 6 2 2 12 9
CodePudding user response:
maybe this:
foo[, V1 := sapply(i, function(x) mean(w[-match(x,i)])) , by=.(f, t)]
# i t f w V1
# 1: 1 1 1 1 4
# 2: 1 2 1 2 5
# 3: 2 1 1 3 3
# 4: 2 2 1 4 4
# 5: 3 1 1 5 2
# 6: 3 2 1 6 3
# 7: 4 1 2 7 10
# 8: 4 2 2 8 11
# 9: 5 1 2 9 9
# 10: 5 2 2 10 10
# 11: 6 1 2 11 8
# 12: 6 2 2 12 9