I know there is a lot on this topic, but I still could not figure it out myself. I have a dataset with various dependent (here: flux) and independent/ explanatory variables (expl). My goal is to iterate through all dependent with each explanatory variable using different models. In the end, I thought to pipe it through purr::glance(), so I could filter for significant p-values and compare the BICs of models.
Here comes a part of my dataset:
library(tidyverse)
library(broom)
dat <- structure(list(management = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "c"
), class = "factor"), flux1 = c(61.51, 73.3, 66.74, 117.75, 74.69,
74.12, 67.28, 78.67, 63.11, 82.74, 101.46, 80.48, 81.47, 56.76,
74.34), flux2 = c(0.72, 8.93, -0.2, 3.9, 0.05, 1.39, -0.69, 2.19,
3.31, 1.43, 8.57, 2.42, 2.56, 1.91, 10.41), flux3 = c(-52.97,
-59, -52.75, -44.95, -52.96, -62.68, -73.13, -120.33, -82.07,
-53.11, -36.01, -75.93, -84.01, -80.23, -127.61), expl1 = c(32.98,
76.13, 70.19, 106.29, 118.66, 102.61, 70.77, 45.93, 74.63, 90.48,
43.34, 98.7, 92.5, 73.68, 83.62), expl2 = c(8.9, 9, 9.03, 10.03,
9.75, 9.85, 9.15, 9.28, 8.97, 9.25, 9.72, 9.32, 9.65, 8.82, 9.1
), expl3 = c(10.12, 11.62, 11.18, 12.9, 14.05, 13.93, 10.8, 10.85,
10.62, 11.97, 11.03, 7.38, 11.55, 12.1, 7.9)), row.names = c(NA,
-15L), class = "data.frame")
So here is what I could get to work so far, inspired by this post:
a <- dat %>%
gather(key = "expl", value = "expl_value", c(expl1:expl3)) %>%
group_split(expl) %>%
map_df(~lm(log(flux1) ~ expl_value, data = .x) %>%
glance() %>%
mutate(expl_group = unique(.x$expl)))
I could get it to iterate through the explanatory variables but not the dependent variables. And, I could only make it work for one model. I further made a list of models, like they did here and tried to map through that list but could not get it to work:
models <- function(.x) {
linear <- lm(flux1 ~ expl_value, data = .x)
exponential <- lm(log(flux1 ) ~ expl_value, data = .x)
logarithmic <- lm(flux1 ~ log(expl_value), data = .x)
power <- lm(log(flux1) ~ log(expl_value), data = .x)
lst(linear, exponential, logarithmic, power)
}
Actually, I also want to expand that list of models by adding management
as a factor, but I can figure that out later. I would appreciate your help and am also open to other suggestions on how to try out various different models and compare their BICs.
Edit: As Stefano pointed out my fluxes can include also negative values (not only flux3). So I would need something like if flux <= 0, then add min(flux) before log-transformation.
Cheers, Anike
CodePudding user response:
Here is a possible solution
library(dplyr)
library(tidyr)
library(purrr)
models <- function(.x) {
linear <- lm(flux_value ~ expl_value, data = .x)
exponential <- lm(log(flux_value ) ~ expl_value, data = .x)
logarithmic <- lm(flux_value ~ log(expl_value), data = .x)
power <- lm(log(flux_value) ~ log(expl_value), data = .x)
list(linear = linear ,
exponential = exponential,
logarithmic = logarithmic,
power = power) |>
map_dfr(broom::glance, .id = "model")
}
dat |>
pivot_longer(values_to = "flux_value",
names_to = "flux",
c(flux1:flux3)) |>
group_by(flux) |>
mutate(flux_value = if(any(flux_value < 0))
{
flux_value - min(flux_value) 1e-6
}
else
flux_value) |>
ungroup() |>
pivot_longer(values_to = "expl_value",
names_to = "expl",
c(expl1:expl3)) |>
group_by(expl, flux) |>
group_modify(~models(.x))
## A tibble: 36 × 15
## Groups: expl, flux [9]
# expl flux model r.squared adj.r.squared sigma statistic p.value df
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 expl1 flux1 linear 0.0572 -0.0153 15.7 0.789 0.391 1
# 2 expl1 flux1 exponenti… 0.0670 -0.00478 0.188 0.933 0.352 1
# 3 expl1 flux1 logarithm… 0.0365 -0.0376 15.9 0.492 0.495 1
# 4 expl1 flux1 power 0.0448 -0.0287 0.190 0.609 0.449 1
# 5 expl1 flux2 linear 0.0167 -0.0589 3.55 0.221 0.646 1
# 6 expl1 flux2 exponenti… 0.00333 -0.0733 4.08 0.0435 0.838 1
# 7 expl1 flux2 logarithm… 0.00764 -0.0687 3.57 0.100 0.757 1
# 8 expl1 flux2 power 0.000465 -0.0764 4.09 0.00605 0.939 1
# 9 expl1 flux3 linear 0.00756 -0.0688 26.9 0.0991 0.758 1
#10 expl1 flux3 exponenti… 0.000325 -0.0766 4.81 0.00422 0.949 1
## … with 26 more rows, and 6 more variables: logLik <dbl>, AIC <dbl>,
## BIC <dbl>, deviance <dbl>, df.residual <int>, nobs <int>