I've created a data frame and a function that I would like to run through the elements of a list object. I would like to output a matrix from the results of the function for each of the list elements.

For example for each of the elements I would like to get this outpu:

	d1	d2	d3
d1	ED(d1, d1)	ED(d2, d1)	ED(d3, d1)
d2	ED(d2, d1)	ED(d2, d2)	ED(d3, d2)
d3	ED(d3, d1)	ED(d2, d3)	ED(d3, d3)

where ED() is the function that I created, and the parameters are the columns in the data frames contained in the list elements.

I know I would use lapply to obtain this output, but I'm unsure of how to get to it.

set.seed(123)


d1 = sample.int(50, 27)
d2 = sample.int(50, 27)
d3 = sample.int(50, 27)
year <- c(1990:1998)
site <- c(rep("a", 9), rep("b", 9), rep("c", 9))


ED = function(x,y){
  #x and y are vectors of spp abundances
  #they must be the same length!
  if(length(x)!=length(y)) stop("Bad abundances!")
  out =  sqrt(sum((x-y)^2))
  out
}

df <- data.frame(site, year, d1 = d1, d2 = d2, d3 = d3)

l <- split(df, df$site)

CodePudding user response：

Here are three ways.
The first two use an auxiliary function in order to make the code more readable. To compute all distances ED between all vectors a double sapply loop is needed.

1. `split/lapply`

set.seed(123)

d1 = sample.int(50, 27)
d2 = sample.int(50, 27)
d3 = sample.int(50, 27)
year <- 1990:1998
site <- c(rep("a", 9), rep("b", 9), rep("c", 9))


ED = function(x,y){
  #x and y are vectors of spp abundances
  #they must be the same length!
  if(length(x)!=length(y)) stop("Bad abundances!")
  out =  sqrt(sum((x-y)^2))
  out
}

df <- data.frame(site, year, d1 = d1, d2 = d2, d3 = d3)

aux <- function(x) {
  x <- as.list(x)
  sapply(x, \(.x) sapply(x, ED, y = .x))
}

l <- split(df[-2], df$site)
lapply(l, \(x) aux(x[-1]))
#> $a
#>          d1       d2       d3
#> d1  0.00000 67.59438 74.17547
#> d2 67.59438  0.00000 55.79426
#> d3 74.17547 55.79426  0.00000
#> 
#> $b
#>          d1       d2       d3
#> d1  0.00000 64.09368 66.21178
#> d2 64.09368  0.00000 52.61179
#> d3 66.21178 52.61179  0.00000
#> 
#> $c
#>          d1       d2       d3
#> d1  0.00000 52.99057 53.45091
#> d2 52.99057  0.00000 42.17819
#> d3 53.45091 42.17819  0.00000

^{Created on 2022-10-15 with reprex v2.0.2}

2. Split and apply with `by`

With bythere is no need to split the data set first.

by(df, df$site, \(subdf) aux(subdf[-(1:2)]))
#> df$site: a
#>          d1       d2       d3
#> d1  0.00000 67.59438 74.17547
#> d2 67.59438  0.00000 55.79426
#> d3 74.17547 55.79426  0.00000
#> ------------------------------------------------------------ 
#> df$site: b
#>          d1       d2       d3
#> d1  0.00000 64.09368 66.21178
#> d2 64.09368  0.00000 52.61179
#> d3 66.21178 52.61179  0.00000
#> ------------------------------------------------------------ 
#> df$site: c
#>          d1       d2       d3
#> d1  0.00000 52.99057 53.45091
#> d2 52.99057  0.00000 42.17819
#> d3 53.45091 42.17819  0.00000

^{Created on 2022-10-15 with reprex v2.0.2}

3. `by` and `dist`

The function ED computes the euclidian distance between two vectors. Use R's built-in function dist and there is no need for ED nor aux.

by(df, df$site, \(subdf) dist(t(subdf[-(1:2)]), diag = TRUE, upper = TRUE))
#> df$site: a
#>          d1       d2       d3
#> d1  0.00000 67.59438 74.17547
#> d2 67.59438  0.00000 55.79426
#> d3 74.17547 55.79426  0.00000
#> ------------------------------------------------------------ 
#> df$site: b
#>          d1       d2       d3
#> d1  0.00000 64.09368 66.21178
#> d2 64.09368  0.00000 52.61179
#> d3 66.21178 52.61179  0.00000
#> ------------------------------------------------------------ 
#> df$site: c
#>          d1       d2       d3
#> d1  0.00000 52.99057 53.45091
#> d2 52.99057  0.00000 42.17819
#> d3 53.45091 42.17819  0.00000

^{Created on 2022-10-15 with reprex v2.0.2}

CodePudding user response：

You can do outer operation and since the parameters are columns you must vectorize x y (i do it with mapply)

v <- c("d1","d2","d3")

lapply(l, function(li) {
  outer(v,v, function(x,y) {
    mapply(x,y, FUN = function(xi,yi) 
      ED(li[,xi],li[,yi]), SIMPLIFY = T) 
    })
  })

1. split/lapply

2. Split and apply with by

3. by and dist

1. `split/lapply`

2. Split and apply with `by`

3. `by` and `dist`