Home > Software engineering >  Error "contrasts can be applied only to factors with 2 or more levels" when running a (mix
Error "contrasts can be applied only to factors with 2 or more levels" when running a (mix

Time:12-16

I'm trying to run a simple linear regression model including an outcome (continuous_outcome) and two dummy variables for smoking (current_vs_neversmoking and former_vs_neversmoking). I previously had these two variables combined as just one factor with three levels, but that compares a level to the 2 other levels (i.e. current vs non-current), where I want to specifically compare current vs never and former vs never.

When I try to run the model, I get the error Error in `\contrasts<-`(`*tmp*`, value = contr.funs[1 isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels.

My data and code look as follows:

mydata <- structure(list(pat_id = c(1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 
3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 9, 9, 10, 
10, 11, 11, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 14, 15, 15, 
16, 16, 17, 17, 17, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 
22, 22, 22, 22, 23, 23, 24, 24, 24, 24, 24, 25, 25, 26, 26, 26, 
26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 31, 31, 31, 32, 32, 
33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36, 
36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44, 
44, 44, 44, 45, 45, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 
48, 49, 49, 50, 50), continuous_outcome = c(0.270481901933073, 
0.306562240871999, 0.586489601087521, 0.663162791491994, 0.696568742621393, 
0.573238528525012, -1.50517834064486, -1.14239124190004, 0.602167001833233, 
0.942169278018825, 0.957507525424839, 0.942401042208738, 1.10173901173947, 
-1.23467796994225, -0.0205580225283486, -0.231308201295527, -0.244470432048288, 
-0.256490437743765, 0.493465625373049, 0.406426360030117, 0.439098160535839, 
0.466158747996811, 0.637429149477194, 0.0219441253328183, 0.102660112718747, 
0.264537705164256, 0.110814584186878, 0.49920541931488, 1.81235625865717, 
1.82870935879674, 0.652695891088804, 0.69291517381055, -0.414081564221917, 
-0.147536404237028, 1.21903849053896, 1.06257819295167, 1.10222362013134, 
1.13246743635661, -0.670943276171988, -0.29653504137582, 0.0590836540990421, 
0.282795470829998, -3.03315551333956, -1.88568994249489, -1.65312212848836, 
-1.13355891646777, -2.20351671143641, -1.45344735861464, -1.25516950174665, 
-0.743390964862038, -0.4629610158192, 0.606862844948187, 0.639058684113426, 
0.609702655264534, 0.633960970096869, 0.548906526787276, 0.108205702176247, 
0.124050755621246, -0.881940114877928, -1.12908469428316, -1.48617053617301, 
-1.45848671123536, 0.0944288383151997, 0.279125369127663, 0.489885538084724, 
0.486578831616853, 0.394325240405338, 0.460090367906543, 0.937968466599025, 
-1.20642488217955, -0.981185479943044, 0.570576924035185, 0.532219882463515, 
0.620627645616656, 0.631553233135331, 0.874526189757774, -0.194145530051932, 
-0.0979606735363465, 0.565800797611727, 0.509862625778819, 0.5741604159953, 
0.519945775026426, 0.387595824059598, 0.395925960524675, -1.74473193173614, 
-0.848779543387106, 1.41774732048115, 1.51159850388708, 0.462882007460068, 
0.483950525664105, -0.366500414469296, -0.0920163339687414, -0.166351980885457, 
-0.0860682256869157, -0.219608109715091, 0.195934077939654, 0.356018784590499, 
0.484056029455595, 0.57498034210306, 0.572359796530477, 0.599809068756398, 
0.542583937381158, 0.698337291640914, 0.740921504459827, 0.45772616988788, 
0.405098691997856, 0.485871287409578, 0.442621726153633, 0.29123670436699, 
0.0303617893266618, 0.00448603635822562, -0.0619887479801569, 
0.003984369355659, -0.140521412371098, -0.971697227999586, -1.20190205773194, 
-1.53965813080136, -1.30849790890586, 1.58558160520627, 1.61870389553583, 
-5.84164915563387, -5.84164915563387, 0.777919475931911, 0.972720285314287, 
0.477725719575478, 0.461105062597019, 0.616300922435037, 0.528825235299615, 
0.752152176797313, 0.915416601798041, 0.906483121528581, 0.868345778494055, 
-2.885534489146, -1.64736196365156, -0.768874512446897, -0.66979572486731, 
0.73917509257953, 0.883831498985817, 0.884240158759821, 0.916187794016791, 
1.38773159469184, -0.00127946509641595, 0.302272238178157, 0.340088450861561, 
0.295163832020064, 0.94639364965826, 0.839369926698037, 0.913777832307086, 
0.767222595331384, 0.898887351534535), current_vs_neversmoking = structure(c(NA, 
NA, NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 
2L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 2L, NA, NA, 1L, 1L, NA, 
NA, NA, NA, 1L, 1L, NA, NA, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, NA, NA, NA, NA), .Label = c("Never smoker", "Current smoker"
), class = "factor"), former_vs_neversmoking = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 1L, 1L, 1L, 
1L, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
NA, NA, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
NA, NA, NA, NA, NA, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, 2L, 2L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 2L, 2L, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L), .Label = c("Never smoker", "Former smoker"
), class = "factor")), row.names = c(NA, 150L), class = "data.frame")

summary(mydata)
     pat_id      continuous_outcome   current_vs_neversmoking   former_vs_neversmoking
 Min.   : 1.00   Min.   :-5.8416    Never smoker  :25         Never smoker :25        
 1st Qu.:11.25   1st Qu.:-0.2132    Current smoker:28         Former smoker:97        
 Median :24.00   Median : 0.4409    NA's          :97         NA's         :28        
 Mean   :24.60   Mean   : 0.0737                                                      
 3rd Qu.:36.00   3rd Qu.: 0.6493                                                      
 Max.   :50.00   Max.   : 1.8287                                                      

model_1 <- lm(formula=continuous_outcome ~ current_vs_neversmoking   former_vs_neversmoking, 
              data=mydata, 
              na.action="na.omit")

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1   isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Why do I get this error? Both categorical variables are coded as factors and have 2 levels...

CodePudding user response:

You have coded your factors wrong.

> table(mydata$current_vs_neversmoking, mydata$former_vs_neversmoking)

                 Never smoker Former smoker
  Never smoker             25             0
  Current smoker            0             0

Shows that the only rows woth non_na values are the ones where both current_vs_neversmoking == 'Never smoker' and 'ormer_vs_neversmoking == 'Never smoker''. Note that you throw away everything with NA's when estimating your model.

I believe you want to include your smoking variables in a single factor but recode it so that the never smoker is the baseline.

mydata$smoker <- ifelse(is.na(mydata$current_vs_neversmoking), as.character(mydata$former_vs_neversmoking), as.character(mydata$current_vs_neversmoking))
mydata$smoker <- factor(mydata$smoker, levels=c("Never smoker", "Current smoker",  "Former smoker"))

Now:

summary(model_1 <- lm(formula=continuous_outcome ~ smoker, 
              data=mydata, 
              na.action="na.omit"))
Call:
lm(formula = continuous_outcome ~ smoker, data = mydata, na.action = "na.omit")

Residuals:
    Min      1Q  Median      3Q     Max
-5.8724 -0.2720  0.2926  0.5526  1.7979

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)
(Intercept)           -0.4049     0.2161  -1.874 0.062948 .
smokerCurrent smoker   1.0545     0.2973   3.547 0.000523 ***
smokerFormer smoker    0.4356     0.2423   1.798 0.074257 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.08 on 147 degrees of freedom
Multiple R-squared:  0.08135,   Adjusted R-squared:  0.06885
F-statistic: 6.509 on 2 and 147 DF,  p-value: 0.001957

Now current and former smokers are compared to the baseline of never-smokers.

  • Related