Home > other >  Plotting statistical power vs replicates and calculating mean of coefficients
Plotting statistical power vs replicates and calculating mean of coefficients

Time:10-21

I need to plot the statistical power vs. the number of replicates and in this case the number of replicates (n) is 3, but I can't figure out how to plot it.

This is what I have:

library(car)

n <- 3
nsims <- 1000
p = coef = vector()

  for (i in 1:nsims) {
    treat <- rnorm(n, mean = 460, sd = 110)
    cont <- rnorm(n, mean = 415, sd = 110)

    df <- data.frame(
      y = c(treat, cont), 
      x = rep(c("treat", "cont"), each = n)
    )

    model <- glm(y ~ x, data = df)
    
    p[i] = Anova(model)$P
    coef[i] = coef(model)[2]
  }

hist(p, col = 'skyblue')
sum(p < 0.05)/nsims

Can someone help me plot this?

Also, I need to calculate the mean of the coefficients using only models where p < 0.05. This is simulating the following process: if you perform the experiment, and p > 0.05, you report 'no effect’, but if p < 0.05 you report ‘significant effect’. But I'm not sure how to set that up from what I have.

Would I just do this?

mean(coef)

But I don't know how to include only those with p < 0.05.

Thank you!

CodePudding user response:

Disclaimer: I spend a decent amount of time simulating experiments for work so I have strong opinions on this.

If that's everything because it's for a study assignment then fine, if you are planning to go further with this I recommend

  • adding the tidyverse to your arsenal.
  • Encapsulating functionality

First allows me to put a single iteration into a function to decouple its logic from the result subsetting (the encapsulation).

sim <- function(n) {
  treat <- rnorm(n, 460, 110)
  cont <- rnorm(n, 415, 110)
  data <- data.frame(y = c(treat, cont), x = rep(c("treat", "cont"), each = n))
  model <- glm(y ~ x, data = data)
  p <- car::Anova(model)$P
  coef <- coef(model)[2]
  data.frame(n, p, coef)
}

Now we can simulate

nsims <- 1000
sims <- do.call(
  rbind, 
  # We are now using the parameter as opposed to the previous post.
  lapply(
    rep(c(3, 5, 10, 20, 50, 100), each = nsims), 
    sim
  )
)

# Aggregations
power_smry <- aggregate(p ~ n, sims, function(x) {mean(x < 0.05)})
coef_smry <- aggregate(coef ~ n, sims[sims$p < 0.05, ], mean)

# Plots
plot(p ~ n, data = power_smry

If you do this in the tidyverse this is one possible approach

crossing(
  n = rep(c(3, 5, 10, 20, 50, 100))
  # Add any number of other inputs here that you want to explore (like lift).
) %>%
  rowwise() %>%
  # This looks complicated but will be less so if you have multiple 
  # varying hyperparameters defined in crossing.
  mutate(results = list(bind_rows(rerun(nsims, sim(n))))) %>% 
  pull(results) %>%
  bind_rows() %>%
  group_by(n) %>%

  # The more metrics you want to summarize in different ways the easier compared to base.
  summarize(
    power = mean(p < 0.05),
    coef = mean(coef[p < 0.05])
  )
  •  Tags:  
  • r
  • Related