Using as.formula(paste()) in a loop-CodePudding

I am trying to perform the brown–forsythe test from the onewaytests package in a loop. Data is from a .sav file and I am building a new dataframe from two of its variables to use in the test. Using the as.formula(paste0()) line, I am getting the error Error in class(ff) <- "formula" : attempt to set an attribute on NULL. Any ideas?

j = c("independent.var1", "independent.var2")

for (i in j) {  
    vari = as.formula(paste0("data1$",i))
    data2 <- data.frame(variable = vari, edu = factor(data1$dep.var))
    bf <- bf.test(variable ~ edu, data = data2)
}

CodePudding user response：

The main problem is that the data has labels that cannot be converted to factors with factor nor with as.factor. Columns of class "haven_labelled" such as

class(data1[["T1"]])
#> [1] "haven_labelled" "vctrs_vctr"     "double"

must be converted to factor with haven::as_factor. From the documentation:

Description
The base function as.factor() is not a generic, but this variant is. Methods are provided for factors, character vectors, labelled vectors, and data frames. By default, when applied to a data frame, it only affects labelled columns.

library(onewaytests)
library(haven)

indep.var <- grep("^T\\d $", names(data1), value = TRUE)
indep.var <- stringr::str_sort(indep.var, numeric = TRUE)
dep.var <- "K1_1"

bf <- vector("list", length = length(indep.var))
names(bf) <- indep.var

for(i in indep.var){
  vari <- data1[[i]]
  data2 <- data.frame(variable = vari, edu = as_factor(data1[[dep.var]]))
  bf[[i]] <- bf.test(variable ~ edu, data = data2)
}
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.7080925 
#>   num df     : 3 
#>   denom df   : 4.304888 
#>   p.value    : 0.5927074 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 1.770985 
#>   num df     : 3 
#>   denom df   : 11.04268 
#>   p.value    : 0.2104647 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.7456779 
#>   num df     : 3 
#>   denom df   : 13.0548 
#>   p.value    : 0.5437829 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.04974693 
#>   num df     : 3 
#>   denom df   : 1.544013 
#>   p.value    : 0.9808962 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 1.022504 
#>   num df     : 3 
#>   denom df   : 2.121031 
#>   p.value    : 0.5234018 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.9646639 
#>   num df     : 3 
#>   denom df   : 1.045435 
#>   p.value    : 0.611455 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.1568329 
#>   num df     : 3 
#>   denom df   : 3.648005 
#>   p.value    : 0.9196509 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.7988569 
#>   num df     : 3 
#>   denom df   : 1.352439 
#>   p.value    : 0.6283363 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.1770073 
#>   num df     : 3 
#>   denom df   : 3.624132 
#>   p.value    : 0.9063619 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 0.9526167 
#>   num df     : 3 
#>   denom df   : 1.000817 
#>   p.value    : 0.6189415 
#> 
#>   Result     : Difference is not statistically significant. 
#> ------------------------------------------------------------- 
#> 
#> 
#>   Brown-Forsythe Test (alpha = 0.05) 
#> ------------------------------------------------------------- 
#>   data : variable and edu 
#> 
#>   statistic  : 1.009219 
#>   num df     : 3 
#>   denom df   : 1.000472 
#>   p.value    : 0.6070452 
#> 
#>   Result     : Difference is not statistically significant. 
#> -------------------------------------------------------------

bf[["T1"]]$statistic
#> [1] 0.7080925
bf[[1]]$statistic      # same as previous
#> [1] 0.7080925
bf[["T1"]]$p.value
#> [1] 0.5927074

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

To run all regressions of the K* variables on the T* regressors, the code below first gets their names with grep, then creates a data.frame with all pairwise combinations. The regressions are run by a function bf_test_fun. The results are in bf_list bf_final and possible errors in the logical vector err.

library(onewaytests)
library(haven)

bf_test_fun <- function(x, data, verbose = TRUE){
  data2 <- data.frame(variable = data[[ x[2] ]], 
                      edu = as_factor(data[[ x[1] ]]))
  tryCatch(bf.test(variable ~ edu, data = data2, verbose = verbose),
           error = function(e) e)
}

indep.var <- grep("^T\\d $", names(data1), value = TRUE)
indep.var <- stringr::str_sort(indep.var, numeric = TRUE)
dep.var <- grep("^K\\d _", names(data1), value = TRUE)
dep.var <- stringr::str_sort(dep.var, numeric = TRUE)

regr <- expand.grid(dep.var, indep.var)
names(regr) <- c("dep.var", "indep.var")
head(regr)
#>   dep.var indep.var
#> 1    K1_1        T1
#> 2    K1_2        T1
#> 3    K1_3        T1
#> 4    K1_4        T1
#> 5    K1_5        T1
#> 6    K1_6        T1

bf_list <- apply(regr, 1, bf_test_fun, data = data1, verbose = FALSE)
names(bf_list) <- apply(regr, 1, paste, collapse = ".")
err <- sapply(bf_list, inherits, "error")

sum(err)
#> [1] 0

bf_final <- bf_list[!err]
length(bf_final)
#> [1] 968

bf_final[["K1_1.T1"]]$statistic
#> [1] 0.7080925
bf_final$K1_1.T1$statistic       # same as above
#> [1] 0.7080925
bf_final[[1]]$statistic          # same as above
#> [1] 0.7080925

^{Created on 2022-03-17 by the reprex package (v2.0.1)}