In the dataframe df
, I construct a have a function f
that calculates the correlation between x.sample and y.sample. Then, I am trying to run 999 randomizations that calculates the expected correlation for each randomization in per
. I am not sure if the lapply here is written correctly and if it's actually taking in the per
function. What is an easy way to verify this given that I'm calculating per
and not any other function over the 4 sp
?
set.seed(111)
library(truncnorm)
x <- rtruncnorm(n = 288,a = 0,b = 10,mean = 5,sd = 2)
v <- rtruncnorm(n = 288,a = 0,b = 10,mean = 5,sd = 2)
y <- ((v/x^2) - (1/x))
sp <- rep(c("A","B","C","D"), each = 72)
df <- data.frame(v,x,y,sp)
library(data.table)
setDT(df)
# function to estimate model coefficients
f <- function(x,v) {x.sample <- sample(x, length(x), replace=T)
y.sample <- (v/x.sample^2) - (1/x.sample)
per <- cor(y.sample, x.sample)}
set.seed(1234)
# 999 models for each species
result = rbindlist(
lapply(1:999, \(i) df[,.(est = f(x,v)), sp][, i:=i])
)
CodePudding user response:
I'm not super familiar with data.table
, so the notation in the lapply()
call looks foreign, but after playing around with it, I think it is doing what you intend.
One way to check is to code it differently and visually compare results:
library(dplyr)
out_list <- list()
for(i in 1:999){
df2 <- df %>%
group_by(sp) %>%
mutate(est = f(x, v)) %>%
select(sp, est) %>%
distinct()
out_list[[i]] <- df2
}
out_df <- bind_rows(out_list, .id = "id")
library(ggplot2)
p1 <- ggplot()
geom_histogram(data = out_df, mapping = aes(x = est, color = sp, fill = sp), show.legend = FALSE)
facet_wrap(~sp)
p2 <- ggplot()
geom_histogram(data = result, mapping = aes(x = est, color = sp, fill = sp), show.legend = FALSE)
facet_wrap(~sp)
gridExtra::grid.arrange(p1, p2, ncol = 2)