I've created a data frame and a function that I would like to run through the elements of a list object. I would like to output a matrix from the results of the function for each of the list elements.
For example for each of the elements I would like to get this outpu:
d1 | d2 | d3 | |
---|---|---|---|
d1 | ED(d1, d1) | ED(d2, d1) | ED(d3, d1) |
d2 | ED(d2, d1) | ED(d2, d2) | ED(d3, d2) |
d3 | ED(d3, d1) | ED(d2, d3) | ED(d3, d3) |
where ED()
is the function that I created, and the parameters are the columns in the data frames contained in the list elements.
I know I would use lapply
to obtain this output, but I'm unsure of how to get to it.
set.seed(123)
d1 = sample.int(50, 27)
d2 = sample.int(50, 27)
d3 = sample.int(50, 27)
year <- c(1990:1998)
site <- c(rep("a", 9), rep("b", 9), rep("c", 9))
ED = function(x,y){
#x and y are vectors of spp abundances
#they must be the same length!
if(length(x)!=length(y)) stop("Bad abundances!")
out = sqrt(sum((x-y)^2))
out
}
df <- data.frame(site, year, d1 = d1, d2 = d2, d3 = d3)
l <- split(df, df$site)
CodePudding user response:
Here are three ways.
The first two use an auxiliary function in order to make the code more readable. To compute all distances ED
between all vectors a double sapply
loop is needed.
1. split/lapply
set.seed(123)
d1 = sample.int(50, 27)
d2 = sample.int(50, 27)
d3 = sample.int(50, 27)
year <- 1990:1998
site <- c(rep("a", 9), rep("b", 9), rep("c", 9))
ED = function(x,y){
#x and y are vectors of spp abundances
#they must be the same length!
if(length(x)!=length(y)) stop("Bad abundances!")
out = sqrt(sum((x-y)^2))
out
}
df <- data.frame(site, year, d1 = d1, d2 = d2, d3 = d3)
aux <- function(x) {
x <- as.list(x)
sapply(x, \(.x) sapply(x, ED, y = .x))
}
l <- split(df[-2], df$site)
lapply(l, \(x) aux(x[-1]))
#> $a
#> d1 d2 d3
#> d1 0.00000 67.59438 74.17547
#> d2 67.59438 0.00000 55.79426
#> d3 74.17547 55.79426 0.00000
#>
#> $b
#> d1 d2 d3
#> d1 0.00000 64.09368 66.21178
#> d2 64.09368 0.00000 52.61179
#> d3 66.21178 52.61179 0.00000
#>
#> $c
#> d1 d2 d3
#> d1 0.00000 52.99057 53.45091
#> d2 52.99057 0.00000 42.17819
#> d3 53.45091 42.17819 0.00000
Created on 2022-10-15 with reprex v2.0.2
2. Split and apply with by
With by
there is no need to split the data set first.
by(df, df$site, \(subdf) aux(subdf[-(1:2)]))
#> df$site: a
#> d1 d2 d3
#> d1 0.00000 67.59438 74.17547
#> d2 67.59438 0.00000 55.79426
#> d3 74.17547 55.79426 0.00000
#> ------------------------------------------------------------
#> df$site: b
#> d1 d2 d3
#> d1 0.00000 64.09368 66.21178
#> d2 64.09368 0.00000 52.61179
#> d3 66.21178 52.61179 0.00000
#> ------------------------------------------------------------
#> df$site: c
#> d1 d2 d3
#> d1 0.00000 52.99057 53.45091
#> d2 52.99057 0.00000 42.17819
#> d3 53.45091 42.17819 0.00000
Created on 2022-10-15 with reprex v2.0.2
3. by
and dist
The function ED
computes the euclidian distance between two vectors. Use R's built-in function dist
and there is no need for ED
nor aux
.
by(df, df$site, \(subdf) dist(t(subdf[-(1:2)]), diag = TRUE, upper = TRUE))
#> df$site: a
#> d1 d2 d3
#> d1 0.00000 67.59438 74.17547
#> d2 67.59438 0.00000 55.79426
#> d3 74.17547 55.79426 0.00000
#> ------------------------------------------------------------
#> df$site: b
#> d1 d2 d3
#> d1 0.00000 64.09368 66.21178
#> d2 64.09368 0.00000 52.61179
#> d3 66.21178 52.61179 0.00000
#> ------------------------------------------------------------
#> df$site: c
#> d1 d2 d3
#> d1 0.00000 52.99057 53.45091
#> d2 52.99057 0.00000 42.17819
#> d3 53.45091 42.17819 0.00000
Created on 2022-10-15 with reprex v2.0.2
CodePudding user response:
You can do outer operation and since the parameters are columns you must vectorize x y (i do it with mapply)
v <- c("d1","d2","d3")
lapply(l, function(li) {
outer(v,v, function(x,y) {
mapply(x,y, FUN = function(xi,yi)
ED(li[,xi],li[,yi]), SIMPLIFY = T)
})
})