In R I need to have the average of the first and second values, then the first and third values, etc ... then the average of the second and first values, and so it gets to 96, because that's how many values my file has (in total, I need 9216 such averages) It would be good to do it in an automated way, e.g. in a loop
CodePudding user response:
Could you at least supply what a sample output would look like from the example that you specified? It was hard following your word problem with etc... as part of the description.
CodePudding user response:
You should always provide reproducible data, for example:
set.seed(42)
N <- 96
X <- round(runif(N, 1000, 9999))
str(X)
# num [1:96] 9232 9433 3575 8473 6775 ...
Now you want the mean of all possible pairs:
pairs <- expand.grid(seq(N), seq(N))
str(pairs)
# 'data.frame': 9216 obs. of 2 variables:
# $ Var1: int 1 2 3 4 5 6 7 8 9 10 ...
# $ Var2: int 1 1 1 1 1 1 1 1 1 1 ...
# - attr(*, "out.attrs")=List of 2
# ..$ dim : int [1:2] 96 96
# ..$ dimnames:List of 2
# .. ..$ Var1: chr [1:96] "Var1= 1" "Var1= 2" "Var1= 3" "Var1= 4" ...
# .. ..$ Var2: chr [1:96] "Var2= 1" "Var2= 2" "Var2= 3" "Var2= 4" ...
Now compute the means:
X.mn <- apply(pairs, 1, function(x) mean(c(X[x[1]], X[x[2]])))
str(X.mn)
# num [1:9216] 9232 9332 6404 8852 8004 ...
summary(X.mn)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1002 4349 5707 5709 7154 9899
Once you know how to use vectorization, you can often eliminate the need for a loop.