Home > Software design >  Equatiomatic not building my equation from logistic model
Equatiomatic not building my equation from logistic model

Time:06-13

I have the following data, the response variable which is categorical is gleason_score, while the predictor, a continuous variable is age. I have fitted the data using glm function to have a logistic regression model.

structure(list(age = c(69L, 60L, 78L, 73L, 80L, 78L, 89L, 75L, 
66L, 74L, 72L, 80L, 63L, 100L, 67L, 73L, 75L, 83L, 72L, 73L, 
50L, 75L, 70L, 56L, 75L, 70L, 90L, 65L, 70L, 80L, 73L, 70L, 70L, 
75L, 71L, 65L, 65L, 72L, 67L, 65L, 70L, 75L, 85L, 75L, 70L, 86L, 
74L, 78L, 64L, 70L, 65L, 65L, 70L, 74L, 77L, 75L, 65L, 80L, 70L, 
70L, 58L, 58L, 65L, 78L, 76L, 80L, 66L, 71L, 70L, 55L, 70L, 90L, 
78L, 67L, 65L, 60L, 69L, 80L, 72L, 76L, 68L, 77L, 88L, 69L, 79L, 
77L, 78L, 66L, 80L, 72L, 81L, 80L, 70L, 86L, 87L, 70L, 80L, 66L, 
60L, 50L, 69L, 63L, 75L, 68L, 68L, 75L, 63L, 74L, 54L, 81L, 72L, 
70L, 68L, 55L, 75L, 75L, 65L, 72L, 77L, 64L, 64L, 76L, 83L, 95L, 
85L, 70L, 75L, 75L, 61L, 95L, 72L, 81L, 87L, 70L, 77L, 70L, 65L, 
77L, 70L, 65L, 70L, 75L, 68L, 93L, 65L, 65L, 75L, 78L, 86L), 
    gleason_score = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
    2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 
    2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
    2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor")), class = "data.frame", row.names = c(NA, 
-149L))

Using the following code:

attach(data2)
glm.fit=glm(gleason_score ~ age, family=binomial(link = "logit"))
plot(x=age, y=gleason_score)
lines(age, glm.fit$fitted.values)
summary(glm.fit)

I have these results:

Call:
glm(formula = gleason_score ~ age, family = binomial(link = "logit"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1515   0.4451   0.5806   0.6530   1.0105  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -2.17146    1.87588  -1.158   0.2470  
age          0.05155    0.02651   1.944   0.0518 .

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 141.02  on 148  degrees of freedom
Residual deviance: 137.00  on 147  degrees of freedom
AIC: 141

Number of Fisher Scoring iterations: 4

I use the following code to build the equation:

equatiomatic::extract_eq(glm.fit, wrap = FALSE,use_coefs = TRUE)

but I have this error message:

Error in model$data[which(model$y == 1)[1], outcome_nm] : object of type 'environment' is not subsettable

Kindly assist in tracing my error

CodePudding user response:

Do not use attach(data2).. Instead, pass data2 to the data argument of the glm() call.

glm.fit=glm(gleason_score ~ age, data=data2, family=binomial(link = "logit"))
equatiomatic::extract_eq(glm.fit, wrap = FALSE,use_coefs = TRUE)

Output:

$$
\log\left[ \frac { \widehat{P( \operatorname{gleason\_score} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{gleason\_score} = \operatorname{1} )} } \right] = -2.17   0.05(\operatorname{age})
$$

To see why, compare glm.fit$data when glm.fit is created

  1. with attach(data2) vs.
  2. without using attach and instead passing data2 to data arg

The 2nd approach is correct.

Under the first approach (yours), glm.fit$data returns this:

<environment: R_GlobalEnv>

Under the second approach (correct), glm.fit$data returns the actual data (note only first six rows shown here)

  age gleason_score
1  69             1
2  60             1
3  78             0
4  73             1
5  80             1
6  78             1
  • Related