How to run a linear regression in R with custom significance level-CodePudding

I'm trying to run a 12 linear regression and want to correct for multiple testing problems.

The significance level in my field is usually p = 0.05 With the Bonferroni correction it would be p = 0.05 / 12 = 0.0041

If I run the regression as

fit <- lm(y ~ x1   x2, data = mydata)

I get the significance levels 0.0001, 0.001, 0.05, and 0.1 and the related signifcance codes ***, **, *, and ..

Is there a way to define a new significance level (p_sig = 0.05/12) and then assign a new code (e.g. *.*) to that? So that when I re-run the regression this will be used as a new significance level?

Thanks in advance for any help on this!

I know how to do this after running the regression.

fit <- lm(y ~ x1   x2, data = mydata)

results <- summary(fit)

significant <- results$p.value < 0.004

But I'd want to have the "nice" table output.

CodePudding user response：

I'd suggest to run individual models, then apply Bonferroni correction and create the table:

set.seed(123)

pvalues <- vector()

for(i in 1:12){
  x <- 1:20
  y <- c(1:20)   rnorm(20,sd = 10)
  
  fit <- lm(y ~x)
  pvalues[i] <- anova(fit)$'Pr(>F)'[1]
  
  # If you wanna check regressions:
  # plot(x,y)
  # abline(fit)
}

# without Bonferroni correction
pvalues
# [1] 4.304367e-02 2.657028e-05 5.380535e-03 2.170672e-02 3.724384e-03 2.027253e-03
# [7] 3.031334e-02 1.524716e-03 7.847783e-02 4.582751e-02 5.615904e-01 4.654844e-02
pvalues < 0.05
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

# with Bonferroni correction
p.adjust(pvalues, method = 'bonferroni', n = 12)
# [1] 0.5165240276 0.0003188434 0.0645664243 0.2604806671 0.0446926024 0.0243270335
# [7] 0.3637601361 0.0182965929 0.9417339387 0.5499300671 1.0000000000 0.5585812930
p.adjust(pvalues, method = 'bonferroni', n = 12) < 0.05
# [1] FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE

bon = p.adjust(pvalues, method = 'bonferroni', n = 12)
# Add significance levels
levelvalues <- rep('', length = 12)
levelvalues[bon < 0.1] <- "."
levelvalues[bon < 0.05] <- "*"
levelvalues[bon < 0.01] <- "**"
levelvalues[bon < 0.001] <- "***"
levelvalues[is.na(bon)] <- "n.s."

# create table
table <- data.frame(Model = 1:12, `p value` = paste0(scales::scientific(bon, digits = 3),levelvalues))

print(table, row.names = F)
# Model p.value
# 1      5.17e-01
# 2   3.19e-04***
# 3     6.46e-02.
# 4      2.60e-01
# 5     4.47e-02*
# 6     2.43e-02*
# 7      3.64e-01
# 8     1.83e-02*
# 9      9.42e-01
# 10     5.50e-01
# 11     1.00e 00
# 12     5.59e-01

You can also apply other kinds of corrections such as Holm and Benjamini-Hochberg

CodePudding user response：

This is produced by symnum so modify that using trace. Change the trace line as needed.

set.seed(123)
x <- 1:10
y <- 1   2*x   rnorm(10)

trace(symnum, quote({ cutpoints=c(0, 0.05/12, 0.1, 1); symbols=c("*", ".", "") }),
  print = FALSE)
summary(lm(y ~ x))
## ...
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.5255     0.6673   2.286   0.0516 .
## x             1.9180     0.1075  17.835    1e-07 *
## ---
## Signif. codes:  0 ‘*’ 0.004166667 ‘.’ 0.1 ‘’ 1
## ...

untrace(symnum)