I am trying to perform the brown–forsythe test from the onewaytests
package in a loop. Data is from a .sav file and I am building a new dataframe from two of its variables to use in the test. Using the as.formula(paste0())
line, I am getting the error Error in class(ff) <- "formula" : attempt to set an attribute on NULL
. Any ideas?
j = c("independent.var1", "independent.var2")
for (i in j) {
vari = as.formula(paste0("data1$",i))
data2 <- data.frame(variable = vari, edu = factor(data1$dep.var))
bf <- bf.test(variable ~ edu, data = data2)
}
CodePudding user response:
The main problem is that the data has labels that cannot be converted to factors with factor
nor with as.factor
. Columns of class "haven_labelled"
such as
class(data1[["T1"]])
#> [1] "haven_labelled" "vctrs_vctr" "double"
must be converted to factor with haven::as_factor
. From the documentation:
Description
The base function as.factor() is not a generic, but this variant is. Methods are provided for factors, character vectors, labelled vectors, and data frames. By default, when applied to a data frame, it only affects labelled columns.
library(onewaytests)
library(haven)
indep.var <- grep("^T\\d $", names(data1), value = TRUE)
indep.var <- stringr::str_sort(indep.var, numeric = TRUE)
dep.var <- "K1_1"
bf <- vector("list", length = length(indep.var))
names(bf) <- indep.var
for(i in indep.var){
vari <- data1[[i]]
data2 <- data.frame(variable = vari, edu = as_factor(data1[[dep.var]]))
bf[[i]] <- bf.test(variable ~ edu, data = data2)
}
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.7080925
#> num df : 3
#> denom df : 4.304888
#> p.value : 0.5927074
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 1.770985
#> num df : 3
#> denom df : 11.04268
#> p.value : 0.2104647
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.7456779
#> num df : 3
#> denom df : 13.0548
#> p.value : 0.5437829
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.04974693
#> num df : 3
#> denom df : 1.544013
#> p.value : 0.9808962
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 1.022504
#> num df : 3
#> denom df : 2.121031
#> p.value : 0.5234018
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.9646639
#> num df : 3
#> denom df : 1.045435
#> p.value : 0.611455
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.1568329
#> num df : 3
#> denom df : 3.648005
#> p.value : 0.9196509
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.7988569
#> num df : 3
#> denom df : 1.352439
#> p.value : 0.6283363
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.1770073
#> num df : 3
#> denom df : 3.624132
#> p.value : 0.9063619
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 0.9526167
#> num df : 3
#> denom df : 1.000817
#> p.value : 0.6189415
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
#>
#>
#> Brown-Forsythe Test (alpha = 0.05)
#> -------------------------------------------------------------
#> data : variable and edu
#>
#> statistic : 1.009219
#> num df : 3
#> denom df : 1.000472
#> p.value : 0.6070452
#>
#> Result : Difference is not statistically significant.
#> -------------------------------------------------------------
bf[["T1"]]$statistic
#> [1] 0.7080925
bf[[1]]$statistic # same as previous
#> [1] 0.7080925
bf[["T1"]]$p.value
#> [1] 0.5927074
Created on 2022-03-17 by the reprex package (v2.0.1)
To run all regressions of the K*
variables on the T*
regressors, the code below first gets their names with grep
, then creates a data.frame with all pairwise combinations. The regressions are run by a function bf_test_fun
. The results are in bf_list
bf_final
and possible errors in the logical vector err
.
library(onewaytests)
library(haven)
bf_test_fun <- function(x, data, verbose = TRUE){
data2 <- data.frame(variable = data[[ x[2] ]],
edu = as_factor(data[[ x[1] ]]))
tryCatch(bf.test(variable ~ edu, data = data2, verbose = verbose),
error = function(e) e)
}
indep.var <- grep("^T\\d $", names(data1), value = TRUE)
indep.var <- stringr::str_sort(indep.var, numeric = TRUE)
dep.var <- grep("^K\\d _", names(data1), value = TRUE)
dep.var <- stringr::str_sort(dep.var, numeric = TRUE)
regr <- expand.grid(dep.var, indep.var)
names(regr) <- c("dep.var", "indep.var")
head(regr)
#> dep.var indep.var
#> 1 K1_1 T1
#> 2 K1_2 T1
#> 3 K1_3 T1
#> 4 K1_4 T1
#> 5 K1_5 T1
#> 6 K1_6 T1
bf_list <- apply(regr, 1, bf_test_fun, data = data1, verbose = FALSE)
names(bf_list) <- apply(regr, 1, paste, collapse = ".")
err <- sapply(bf_list, inherits, "error")
sum(err)
#> [1] 0
bf_final <- bf_list[!err]
length(bf_final)
#> [1] 968
bf_final[["K1_1.T1"]]$statistic
#> [1] 0.7080925
bf_final$K1_1.T1$statistic # same as above
#> [1] 0.7080925
bf_final[[1]]$statistic # same as above
#> [1] 0.7080925
Created on 2022-03-17 by the reprex package (v2.0.1)