Extension/Optimisation of code: from one to several iterations-CodePudding

For a matrix of pairwise distances pdm (symmetric), where each row/column represents a point, and a vector of distances r, I will do th following for each point

# some small toy data 
# note. real data is bigger, e.g. ~15k points.
pdm <- matrix(data = c(0, 4, 3,
                       4, 0, 2, 
                       3, 2, 0), 
              nrow = 3, ncol = 3)
r <- seq(0, 5, .5)
length(r)
#> [1] 11

# index m correspondens to order of points. 
m <- c(1, 2, 3)

# change format 
pdml <- as.list(as.data.frame(pdm))

# ---- 1
# procedure for first point (1) 
a <- list()
for(i in seq_along(r)) {
  a[[i]] <- ifelse(0 < pdml[[1]] & pdml[[1]] <= r[i], 1, 0) 
  a[[i]] <- which(a[[i]] != 0)
  # if-statement is needed since which() produces annoying integer(0) entries
  if(identical(a[[i]], integer(0))) a[[i]] <- 0
  a[[i]] <- sum(m[1] * m[a[[i]]])
}

# change format 
do.call(rbind, a)
#>       [,1]
#>  [1,]    0
#>  [2,]    0
#>  [3,]    0
#>  [4,]    0
#>  [5,]    0
#>  [6,]    0
#>  [7,]    3
#>  [8,]    3
#>  [9,]    5
#> [10,]    5
#> [11,]    5

# ---- 2 
# procedure for second point (2), 
# ... adaption: pdml[[2]] and m[2]

^{Created on 2022-08-09 by the reprex package (v2.0.1)}

Desired Output After the calculation is done, I would like to calc. the average for each distance $r_i$ across all points.

Can somebody please provide a solution to extend my approach to all points or by showing an alternative, which cerntainly is more efficient? Also, any recommendation on how to improve my question is much appreciated.

Note, if it makes things easier, it is, of course, also an option to use the upper/lower half of pdm only.

CodePudding user response：

If you precalculate the possible products you want to sum with tcrossprod(m), you can simplify the calculation to a couple of matrix operations:

# Input data
m <- c(1, 2, 3)
d <- matrix(
  data = c(
    0, 4, 3,
    4, 0, 2,
    3, 2, 0
  ),
  nrow = 3,
  ncol = 3
)

r <- seq(0, 5) # Reduced for simplicity

# Possible summands
v <- tcrossprod(m) * (d != 0)
v
#>      [,1] [,2] [,3]
#> [1,]    0    2    3
#> [2,]    2    0    6
#> [3,]    3    6    0

# The calculation
a <- sapply(r, \(r) colSums(v * (d <= r)))
a
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    0    0    0    3    5    5
#> [2,]    0    0    6    6    8    8
#> [3,]    0    0    6    9    9    9

And since you said you then wanted the mean for each distance, over points:

colMeans(a)
#> [1] 0.000000 0.000000 4.000000 6.000000 7.333333 7.333333

A slightly more obscure but potentially faster way to find a would be with 3-d arrays:

colSums(outer(v, rep(1, length(r))) * outer(d, r, `<=`))
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    0    0    0    3    5    5
#> [2,]    0    0    6    6    8    8
#> [3,]    0    0    6    9    9    9