Home > Back-end >  Loop glm for every column in R dataset
Loop glm for every column in R dataset

Time:01-11

I have a dataset of 100 patients (7 are shown here), 2 covariates and 50 phenotypes(5 are shown here). I want to perform a multivariable logistic regression for each phenotype using Covariate1 and Covariate2 as covariates to predict the Outcome, I would like to get a table like this, where I have the p-value, OR and confidence interval(CI)per each of the covariates. enter image description here

I tried:

for (i in df) {
  print(i)
  model <-glm(Outcome~ x[i]  Covariate1  Covariate2, family = binomial(link = "logit"), data=df) 

I also tried the solution for this question. But x and y a reversed in my question, so it did not work: R: automate table for results of several multivariable logistic regressions

Thanks very much for your help!

This is an example dataset

df<-structure(list(ID = c(1, 2, 3, 4, 5, 6, 7), Outcome = c(0, 0, 
1, 1, 0, 1, 0), Covariate1 = c(1, 2, 3, 4, 5, 6, 7), Covariate2 = c(0, 
0, 0, 1, 1, 1, 1), P1 = c(1, 0, 0, 1, 1, 1, 2), P2 = c(0, 2, 
0, 1, 1, 1, 1), P3 = c(0, 0, 0, 1, 1, 1, 1), P4 = c(0, 0, 0, 
1, 2, 1, 1), P5 = c(0, 0, 0, 1, 1, 1, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -7L))

CodePudding user response:

if I understood correctly

df <- structure(
  list(
    ID = c(1, 2, 3, 4, 5, 6, 7),
    Outcome = c(0, 0, 1, 1, 0, 1, 0),
    Covariate1 = c(1, 2, 3, 4, 5, 6, 7),
    Covariate2 = c(0, 0, 0, 1, 1, 1, 1),
    P1 = c(1, 0, 0, 1, 1, 1, 2),
    P2 = c(0, 2, 0, 1, 1, 1, 1),
    P3 = c(0, 0, 0, 1, 1, 1, 1),
    P4 = c(0, 0, 0, 1, 2, 1, 1),
    P5 = c(0, 0, 0, 1, 1, 1, 2)
  ),
  class = c("tbl_df",
            "tbl", "data.frame"),
  row.names = c(NA,-7L)
)

library(tidyverse)

first_tables <- map(
  .x = select(df, starts_with("P")),
  .f = ~ glm(
    Outcome ~ .x   Covariate1   Covariate2,
    family = binomial(link = "logit"),
    data = df
  )
) %>%
  map(broom::tidy)

map_df(
  .x = first_tables,
  .f = ~ .x %>% mutate(
    p = p.value,
    OR  = exp(estimate),
    CI5 = exp(estimate - 1.96 * std.error),
    CI95 = exp(estimate   1.96 * std.error),
    .keep = "unused"
  ) %>%
    select(-statistic),
  .id = "phenotype"
)
#> # A tibble: 20 x 6
#>    phenotype term             p        OR        CI5   CI95
#>    <chr>     <chr>        <dbl>     <dbl>      <dbl>  <dbl>
#>  1 P1        (Intercept)  0.551  9.65e- 2  0.0000443 210.  
#>  2 P1        .x           0.997  5.84e-10  0         Inf   
#>  3 P1        Covariate1   0.521  2.55e  0  0.147      44.3 
#>  4 P1        Covariate2   0.998  3.64e  8  0         Inf   
#>  5 P2        (Intercept)  1.00   1.00e  0  0.0173     57.8 
#>  6 P2        .x           0.996  1.53e- 4  0         Inf   
#>  7 P2        Covariate1   1      1   e  0  0.227       4.40
#>  8 P2        Covariate2   0.996  6.52e  3  0         Inf   
#>  9 P3        (Intercept)  0.724  5.00e- 1  0.0106     23.5 
#> 10 P3        .x           0.824  2.00e  0  0.00442   904.  
#> 11 P3        Covariate1   1      1   e  0  0.222       4.51
#> 12 P3        Covariate2  NA     NA        NA          NA   
#> 13 P4        (Intercept)  0.907  7.88e- 1  0.0141     44.1 
#> 14 P4        .x           0.998  3.66e- 9  0         Inf   
#> 15 P4        Covariate1   0.785  7.94e- 1  0.152       4.16
#> 16 P4        Covariate2   0.997  2.59e  9  0         Inf   
#> 17 P5        (Intercept)  0.291  3.15e- 2  0.0000515  19.3 
#> 18 P5        .x           0.997  2.72e-10  0         Inf   
#> 19 P5        Covariate1   0.324  3.64e  0  0.279      47.4 
#> 20 P5        Covariate2   0.998  4.38e  8  0         Inf

Created on 2023-01-11 with reprex v2.0.2

  •  Tags:  
  • r
  • Related