I'm trying to run a simple linear regression model including an outcome (continuous_outcome
) and two dummy variables for smoking (current_vs_neversmoking
and former_vs_neversmoking
). I previously had these two variables combined as just one factor with three levels, but that compares a level to the 2 other levels (i.e. current vs non-current), where I want to specifically compare current vs never and former vs never.
When I try to run the model, I get the error
Error in `\contrasts<-`(`*tmp*`, value = contr.funs[1 isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
.
My data and code look as follows:
mydata <- structure(list(pat_id = c(1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3,
3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 9, 9, 10,
10, 11, 11, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 14, 15, 15,
16, 16, 17, 17, 17, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22,
22, 22, 22, 22, 23, 23, 24, 24, 24, 24, 24, 25, 25, 26, 26, 26,
26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 31, 31, 31, 32, 32,
33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36,
36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44,
44, 44, 44, 45, 45, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48,
48, 49, 49, 50, 50), continuous_outcome = c(0.270481901933073,
0.306562240871999, 0.586489601087521, 0.663162791491994, 0.696568742621393,
0.573238528525012, -1.50517834064486, -1.14239124190004, 0.602167001833233,
0.942169278018825, 0.957507525424839, 0.942401042208738, 1.10173901173947,
-1.23467796994225, -0.0205580225283486, -0.231308201295527, -0.244470432048288,
-0.256490437743765, 0.493465625373049, 0.406426360030117, 0.439098160535839,
0.466158747996811, 0.637429149477194, 0.0219441253328183, 0.102660112718747,
0.264537705164256, 0.110814584186878, 0.49920541931488, 1.81235625865717,
1.82870935879674, 0.652695891088804, 0.69291517381055, -0.414081564221917,
-0.147536404237028, 1.21903849053896, 1.06257819295167, 1.10222362013134,
1.13246743635661, -0.670943276171988, -0.29653504137582, 0.0590836540990421,
0.282795470829998, -3.03315551333956, -1.88568994249489, -1.65312212848836,
-1.13355891646777, -2.20351671143641, -1.45344735861464, -1.25516950174665,
-0.743390964862038, -0.4629610158192, 0.606862844948187, 0.639058684113426,
0.609702655264534, 0.633960970096869, 0.548906526787276, 0.108205702176247,
0.124050755621246, -0.881940114877928, -1.12908469428316, -1.48617053617301,
-1.45848671123536, 0.0944288383151997, 0.279125369127663, 0.489885538084724,
0.486578831616853, 0.394325240405338, 0.460090367906543, 0.937968466599025,
-1.20642488217955, -0.981185479943044, 0.570576924035185, 0.532219882463515,
0.620627645616656, 0.631553233135331, 0.874526189757774, -0.194145530051932,
-0.0979606735363465, 0.565800797611727, 0.509862625778819, 0.5741604159953,
0.519945775026426, 0.387595824059598, 0.395925960524675, -1.74473193173614,
-0.848779543387106, 1.41774732048115, 1.51159850388708, 0.462882007460068,
0.483950525664105, -0.366500414469296, -0.0920163339687414, -0.166351980885457,
-0.0860682256869157, -0.219608109715091, 0.195934077939654, 0.356018784590499,
0.484056029455595, 0.57498034210306, 0.572359796530477, 0.599809068756398,
0.542583937381158, 0.698337291640914, 0.740921504459827, 0.45772616988788,
0.405098691997856, 0.485871287409578, 0.442621726153633, 0.29123670436699,
0.0303617893266618, 0.00448603635822562, -0.0619887479801569,
0.003984369355659, -0.140521412371098, -0.971697227999586, -1.20190205773194,
-1.53965813080136, -1.30849790890586, 1.58558160520627, 1.61870389553583,
-5.84164915563387, -5.84164915563387, 0.777919475931911, 0.972720285314287,
0.477725719575478, 0.461105062597019, 0.616300922435037, 0.528825235299615,
0.752152176797313, 0.915416601798041, 0.906483121528581, 0.868345778494055,
-2.885534489146, -1.64736196365156, -0.768874512446897, -0.66979572486731,
0.73917509257953, 0.883831498985817, 0.884240158759821, 0.916187794016791,
1.38773159469184, -0.00127946509641595, 0.302272238178157, 0.340088450861561,
0.295163832020064, 0.94639364965826, 0.839369926698037, 0.913777832307086,
0.767222595331384, 0.898887351534535), current_vs_neversmoking = structure(c(NA,
NA, NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA,
2L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA,
2L, 2L, 2L, 2L, 2L, 1L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 2L, NA, NA, 1L, 1L, NA,
NA, NA, NA, 1L, 1L, NA, NA, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, NA, NA, NA, NA), .Label = c("Never smoker", "Current smoker"
), class = "factor"), former_vs_neversmoking = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 1L, 1L, 1L,
1L, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
NA, NA, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
NA, NA, NA, NA, NA, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("Never smoker", "Former smoker"
), class = "factor")), row.names = c(NA, 150L), class = "data.frame")
summary(mydata)
pat_id continuous_outcome current_vs_neversmoking former_vs_neversmoking
Min. : 1.00 Min. :-5.8416 Never smoker :25 Never smoker :25
1st Qu.:11.25 1st Qu.:-0.2132 Current smoker:28 Former smoker:97
Median :24.00 Median : 0.4409 NA's :97 NA's :28
Mean :24.60 Mean : 0.0737
3rd Qu.:36.00 3rd Qu.: 0.6493
Max. :50.00 Max. : 1.8287
model_1 <- lm(formula=continuous_outcome ~ current_vs_neversmoking former_vs_neversmoking,
data=mydata,
na.action="na.omit")
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Why do I get this error? Both categorical variables are coded as factors and have 2 levels...
CodePudding user response:
You have coded your factors wrong.
> table(mydata$current_vs_neversmoking, mydata$former_vs_neversmoking)
Never smoker Former smoker
Never smoker 25 0
Current smoker 0 0
Shows that the only rows woth non_na values are the ones where both current_vs_neversmoking == 'Never smoker'
and 'ormer_vs_neversmoking == 'Never smoker'
'. Note that you throw away everything with NA's when estimating your model.
I believe you want to include your smoking variables in a single factor but recode it so that the never smoker is the baseline.
mydata$smoker <- ifelse(is.na(mydata$current_vs_neversmoking), as.character(mydata$former_vs_neversmoking), as.character(mydata$current_vs_neversmoking))
mydata$smoker <- factor(mydata$smoker, levels=c("Never smoker", "Current smoker", "Former smoker"))
Now:
summary(model_1 <- lm(formula=continuous_outcome ~ smoker,
data=mydata,
na.action="na.omit"))
Call:
lm(formula = continuous_outcome ~ smoker, data = mydata, na.action = "na.omit")
Residuals:
Min 1Q Median 3Q Max
-5.8724 -0.2720 0.2926 0.5526 1.7979
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4049 0.2161 -1.874 0.062948 .
smokerCurrent smoker 1.0545 0.2973 3.547 0.000523 ***
smokerFormer smoker 0.4356 0.2423 1.798 0.074257 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.08 on 147 degrees of freedom
Multiple R-squared: 0.08135, Adjusted R-squared: 0.06885
F-statistic: 6.509 on 2 and 147 DF, p-value: 0.001957
Now current and former smokers are compared to the baseline of never-smokers.