Using the l-apply over a function with multiple factors in R-CodePudding

In the dataframe df, I construct a have a function f that calculates the correlation between x.sample and y.sample. Then, I am trying to run 999 randomizations that calculates the expected correlation for each randomization in per. I am not sure if the lapply here is written correctly and if it's actually taking in the per function. What is an easy way to verify this given that I'm calculating per and not any other function over the 4 sp?

set.seed(111)
library(truncnorm)
x <- rtruncnorm(n = 288,a = 0,b = 10,mean = 5,sd = 2)
v <- rtruncnorm(n = 288,a = 0,b = 10,mean = 5,sd = 2)
y <- ((v/x^2) - (1/x))
sp <- rep(c("A","B","C","D"), each = 72)

df <- data.frame(v,x,y,sp)

library(data.table)
setDT(df)
# function to estimate model coefficients
f <- function(x,v) {x.sample <- sample(x, length(x), replace=T)
y.sample <- (v/x.sample^2) - (1/x.sample)
per <- cor(y.sample, x.sample)}

set.seed(1234)
# 999 models for each species
result = rbindlist(
  lapply(1:999, \(i) df[,.(est = f(x,v)), sp][, i:=i])
)

CodePudding user response：

I'm not super familiar with data.table, so the notation in the lapply() call looks foreign, but after playing around with it, I think it is doing what you intend.

One way to check is to code it differently and visually compare results:

library(dplyr)
out_list <- list()
for(i in 1:999){
  df2 <- df %>%
    group_by(sp) %>%
    mutate(est = f(x, v)) %>%
    select(sp, est) %>%
    distinct()
  out_list[[i]] <- df2
}

out_df <- bind_rows(out_list, .id = "id")

library(ggplot2)
p1 <- ggplot()   
  geom_histogram(data = out_df, mapping = aes(x = est, color = sp, fill = sp), show.legend = FALSE)   
  facet_wrap(~sp)
p2 <- ggplot()  
  geom_histogram(data = result, mapping = aes(x = est, color = sp, fill = sp), show.legend = FALSE)  
  facet_wrap(~sp)

gridExtra::grid.arrange(p1, p2, ncol = 2)