I was trying to run a linear model using lm()
in R with 12 explanatory variables and 33 observations), but the coefficients for the last three variables are not defined because of singularities. When I switched the order of the variables, the same thing happens again, even though those variables (TotalPrec_11, TotalPrec_12, TotalPrec_10) were significant before. The coefficients were also different between two models.
ab <- lm(value ~ TotalPrec_12 TotalPrec_11 TotalPrec_10 TotalPrec_9 TotalPrec_8 TotalPrec_7 TotalPrec_6 TotalPrec_5 TotalPrec_4 TotalPrec_3 TotalPrec_2 TotalPrec_1, data = aa)
summary(ab)
#Coefficients: (3 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 64.34 30.80 2.089 0.0480 *
#TotalPrec_12 19811.97 11080.14 1.788 0.0869 .
#TotalPrec_11 -16159.45 7099.89 -2.276 0.0325 *
#TotalPrec_10 -16500.62 18813.96 -0.877 0.3895
#TotalPrec_9 62662.08 51143.37 1.225 0.2329
#TotalPrec_8 665.39 36411.95 0.018 0.9856
#TotalPrec_7 -77203.59 51555.71 -1.497 0.1479
#TotalPrec_6 4830.11 19503.52 0.248 0.8066
#TotalPrec_5 6403.94 14902.77 0.430 0.6714
#TotalPrec_4 -735.73 5023.83 -0.146 0.8848
#TotalPrec_3 NA NA NA NA
#TotalPrec_2 NA NA NA NA
#TotalPrec_1 NA NA NA NA
The same data here with a different order of variables:
ab1 <- lm(value ~ TotalPrec_1 TotalPrec_2 TotalPrec_3 TotalPrec_9 TotalPrec_8 TotalPrec_7 TotalPrec_6 TotalPrec_5 TotalPrec_4 TotalPrec_11 TotalPrec_12 TotalPrec_10, data = aa)
summary(ab1)
#Coefficients: (3 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 63.72 54.44 1.171 0.2538
#TotalPrec_1 19611.54 19366.33 1.013 0.3218
#TotalPrec_2 -14791.44 7847.87 -1.885 0.0722 .
#TotalPrec_3 6766.60 3144.68 2.152 0.0422 *
#TotalPrec_9 28677.62 53530.82 0.536 0.5973
#TotalPrec_8 -23207.34 65965.12 -0.352 0.7282
#TotalPrec_7 -26628.10 55839.25 -0.477 0.6380
#TotalPrec_6 -28694.23 13796.80 -2.080 0.0489 *
#TotalPrec_5 46982.35 17941.89 2.619 0.0154 *
#TotalPrec_4 -26393.70 17656.70 -1.495 0.1486
#TotalPrec_11 NA NA NA NA
#TotalPrec_12 NA NA NA NA
#TotalPrec_10 NA NA NA NA
Several posts online suggest that it might be a multicollinearity problems. I ran the cor()
function to check for collinearity, and nothing came out to be perfectly correlated.
I used the same set of these 12 variables with other response variables, and there was no problem with singularities. So I'm not sure what happens here and what I need to do differently to figure this out.
edit
here is my data
> dput(aa)
structure(list(value = c(93, 95, 88, 90, 90, 80, 100, 80, 96,
100, 100, 100, 80, 94, 88, 76, 90, 0, 93, 100, 88, 90, 95, 71,
92, 93, 92, 100, 85, 90, 100, 100, 100), TotalPrec_1 = c(0.00319885835051536,
0.00319885835051536, 0.00319885835051536, 0.00717973057180643,
0.00717973057180643, 0.00717973057180643, 0.00464357063174247,
0.00464357063174247, 0.00464357063174247, 0.00598198547959327,
0.00598198547959327, 0.00598198547959327, 0.00380058260634541,
0.00380058260634541, 0.00380058260634541, 0.00380058260634541,
0.00364887388423085, 0.00364887388423085, 0.00364887388423085,
0.00475014140829443, 0.00475014140829443, 0.00475014140829443,
0.00475014140829443, 0.00499139120802283, 0.00499139120802283,
0.00499139120802283, 0.00499139120802283, 0.00490436097607016,
0.00490436097607016, 0.00490436097607016, 0.00623255362734198,
0.00623255362734198, 0.00623255362734198), TotalPrec_2 = c(0.00387580785900354,
0.00387580785900354, 0.00387580785900354, 0.00625309534370899,
0.00625309534370899, 0.00625309534370899, 0.00298969540745019,
0.00298969540745019, 0.00298969540745019, 0.00558579061180353,
0.00558579061180353, 0.00558579061180353, 0.00370361795648932,
0.00370361795648932, 0.00370361795648932, 0.00370361795648932,
0.00335893919691443, 0.00335893919691443, 0.00335893919691443,
0.00621500937268137, 0.00621500937268137, 0.00621500937268137,
0.00621500937268137, 0.00234323320910334, 0.00234323320910334,
0.00234323320910334, 0.00234323320910334, 0.00644989637658, 0.00644989637658,
0.00644989637658, 0.00476496992632746, 0.00476496992632746, 0.00476496992632746
), TotalPrec_3 = c(0.00418250449001789, 0.00418250449001789,
0.00418250449001789, 0.00702223135158419, 0.00702223135158419,
0.00702223135158419, 0.00427648611366748, 0.00427648611366748,
0.00427648611366748, 0.00562589056789875, 0.00562589056789875,
0.00562589056789875, 0.0037367227487266, 0.0037367227487266,
0.0037367227487266, 0.0037367227487266, 0.00477339653298258,
0.00477339653298258, 0.00477339653298258, 0.0124167986214161,
0.0124167986214161, 0.0124167986214161, 0.0124167986214161, 0.010678518563509,
0.010678518563509, 0.010678518563509, 0.010678518563509, 0.0139585845172405,
0.0139585845172405, 0.0139585845172405, 0.00741709442809224,
0.00741709442809224, 0.00741709442809224), TotalPrec_4 = c(0.00659881485626101,
0.00659881485626101, 0.00659881485626101, 0.00347008113749325,
0.00347008113749325, 0.00347008113749325, 0.00720167113468051,
0.00720167113468051, 0.00720167113468051, 0.00704727275297045,
0.00704727275297045, 0.00704727275297045, 0.00856677815318107,
0.00856677815318107, 0.00856677815318107, 0.00856677815318107,
0.00867980346083641, 0.00867980346083641, 0.00867980346083641,
0.00614343490451574, 0.00614343490451574, 0.00614343490451574,
0.00614343490451574, 0.00704662408679723, 0.00704662408679723,
0.00704662408679723, 0.00704662408679723, 0.00495137926191091,
0.00495137926191091, 0.00495137926191091, 0.00796654727309942,
0.00796654727309942, 0.00796654727309942), TotalPrec_5 = c(0.00515584181994199,
0.00515584181994199, 0.00515584181994199, 0.000977653078734875,
0.000977653078734875, 0.000977653078734875, 0.00485571753233671,
0.00485571753233671, 0.00485571753233671, 0.00477610062807798,
0.00477610062807798, 0.00477610062807798, 0.00664602871984243,
0.00664602871984243, 0.00664602871984243, 0.00664602871984243,
0.00533714797347784, 0.00533714797347784, 0.00533714797347784,
0.00265633105300366, 0.00265633105300366, 0.00265633105300366,
0.00265633105300366, 0.00200922577641904, 0.00200922577641904,
0.00200922577641904, 0.00200922577641904, 0.00172789173666387,
0.00172789173666387, 0.00172789173666387, 0.00347296684049069,
0.00347296684049069, 0.00347296684049069), TotalPrec_6 = c(0.00170362275093793,
0.00170362275093793, 0.00170362275093793, 0.000670029199682176,
0.000670029199682176, 0.000670029199682176, 0.0018315939232707,
0.0018315939232707, 0.0018315939232707, 0.00138648133724927,
0.00138648133724927, 0.00138648133724927, 0.00329410820268094,
0.00329410820268094, 0.00329410820268094, 0.00329410820268094,
0.00210500298999249, 0.00210500298999249, 0.00210500298999249,
0.000628655252512544, 0.000628655252512544, 0.000628655252512544,
0.000628655252512544, 0.000631613133009523, 0.000631613133009523,
0.000631613133009523, 0.000631613133009523, 0.000616533157881349,
0.000616533157881349, 0.000616533157881349, 0.000599739549215883,
0.000599739549215883, 0.000599739549215883), TotalPrec_7 = c(0.00124496815260499,
0.00124496815260499, 0.00124496815260499, 0.000289129035081714,
0.000289129035081714, 0.000289129035081714, 0.00089572963770479,
0.00089572963770479, 0.00089572963770479, 0.00187503395136445,
0.00187503395136445, 0.00187503395136445, 0.00070394336944446,
0.00070394336944446, 0.00070394336944446, 0.00070394336944446,
0.000733022985514253, 0.000733022985514253, 0.000733022985514253,
4.50894685855019e-06, 4.50894685855019e-06, 4.50894685855019e-06,
4.50894685855019e-06, 3.02730550174601e-05, 3.02730550174601e-05,
3.02730550174601e-05, 3.02730550174601e-05, 3.71173496205301e-06,
3.71173496205301e-06, 3.71173496205301e-06, 4.58224167232402e-05,
4.58224167232402e-05, 4.58224167232402e-05), TotalPrec_8 = c(0.000394100265111774,
0.000394100265111774, 0.000394100265111774, 0.000930351321585476,
0.000930351321585476, 0.000930351321585476, 0.000679628865327686,
0.000679628865327686, 0.000679628865327686, 0.000997507828287781,
0.000997507828287781, 0.000997507828287781, 1.77486290340312e-05,
1.77486290340312e-05, 1.77486290340312e-05, 1.77486290340312e-05,
1.63553704624064e-05, 1.63553704624064e-05, 1.63553704624064e-05,
4.31556363764685e-05, 4.31556363764685e-05, 4.31556363764685e-05,
4.31556363764685e-05, 8.14739760244265e-05, 8.14739760244265e-05,
8.14739760244265e-05, 8.14739760244265e-05, 4.07490988436621e-05,
4.07490988436621e-05, 4.07490988436621e-05, 0.000140139847644605,
0.000140139847644605, 0.000140139847644605), TotalPrec_9 = c(0.000616681878454983,
0.000616681878454983, 0.000616681878454983, 0.000742240983527154,
0.000742240983527154, 0.000742240983527154, 0.000230846126214601,
0.000230846126214601, 0.000230846126214601, 0.00132466584909707,
0.00132466584909707, 0.00132466584909707, 0.000114383190521039,
0.000114383190521039, 0.000114383190521039, 0.000114383190521039,
6.07241054240149e-05, 6.07241054240149e-05, 6.07241054240149e-05,
2.74324702331796e-05, 2.74324702331796e-05, 2.74324702331796e-05,
2.74324702331796e-05, 6.96572624292457e-06, 6.96572624292457e-06,
6.96572624292457e-06, 6.96572624292457e-06, 3.32364725181833e-05,
3.32364725181833e-05, 3.32364725181833e-05, 0.000108777909190394,
0.000108777909190394, 0.000108777909190394), TotalPrec_10 = c(0.00040393992094323,
0.00040393992094323, 0.00040393992094323, 0.00166831514798104,
0.00166831514798104, 0.00166831514798104, 0.000324568885844201,
0.000324568885844201, 0.000324568885844201, 0.000868275004904717,
0.000868275004904717, 0.000868275004904717, 1.25834640130051e-05,
1.25834640130051e-05, 1.25834640130051e-05, 1.25834640130051e-05,
7.2861012085923e-06, 7.2861012085923e-06, 7.2861012085923e-06,
0.000946254527661949, 0.000946254527661949, 0.000946254527661949,
0.000946254527661949, 0.000476793473353609, 0.000476793473353609,
0.000476793473353609, 0.000476793473353609, 0.00102826312649995,
0.00102826312649995, 0.00102826312649995, 0.00266417209059, 0.00266417209059,
0.00266417209059), TotalPrec_11 = c(0.00124716362915933, 0.00124716362915933,
0.00124716362915933, 0.00470362277701497, 0.00470362277701497,
0.00470362277701497, 0.0017967780586332, 0.0017967780586332,
0.0017967780586332, 0.000694554066285491, 0.000694554066285491,
0.000694554066285491, 0.000485763972392306, 0.000485763972392306,
0.000485763972392306, 0.000485763972392306, 0.00074231723556295,
0.00074231723556295, 0.00074231723556295, 0.000763822405133396,
0.000763822405133396, 0.000763822405133396, 0.000763822405133396,
0.00114128366112709, 0.00114128366112709, 0.00114128366112709,
0.00114128366112709, 0.000856105296406895, 0.000856105296406895,
0.000856105296406895, 0.00255026295781135, 0.00255026295781135,
0.00255026295781135), TotalPrec_12 = c(0.00380058260634541, 0.00380058260634541,
0.00380058260634541, 0.00475014140829443, 0.00475014140829443,
0.00475014140829443, 0.00412079365924, 0.00412079365924, 0.00412079365924,
0.00455283792689442, 0.00455283792689442, 0.00455283792689442,
0.00117174908518791, 0.00117174908518791, 0.00117174908518791,
0.00117174908518791, 0.00119069591164588, 0.00119069591164588,
0.00119069591164588, 0.00201585865579545, 0.00201585865579545,
0.00201585865579545, 0.00201585865579545, 0.00202310062013566,
0.00202310062013566, 0.00202310062013566, 0.00202310062013566,
0.00231692171655595, 0.00231692171655595, 0.00231692171655595,
0.00495567917823791, 0.00495567917823791, 0.00495567917823791
)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))
CodePudding user response:
When you have multiple predictors, singularity doesn’t necessarily mean that two variables are perfectly correlated. It means that at least one of your variables can be perfectly predicted by some combination of the other variables, even if none of those variables is a perfect predictor on its own. When you have many predictors relative to few observations, as you do, the odds of this happening increase. So you will probably need to simplify your model.
CodePudding user response:
You are trying to estimate a linear model y = X %*% beta epsilon
given X
and y
. The model matrix X
has 33 rows and 13 columns, one for the intercept and one for each numeric variable:
X <- model.matrix(ab)
dim(X)
## [1] 33 13
colnames(X)
## [1] "(Intercept)" "TotalPrec_12" "TotalPrec_11" "TotalPrec_10" "TotalPrec_9"
## [6] "TotalPrec_8" "TotalPrec_7" "TotalPrec_6" "TotalPrec_5" "TotalPrec_4"
## [11] "TotalPrec_3" "TotalPrec_2" "TotalPrec_1"
But X
has rank 10, not 13:
qr(X)$rank
## [1] 10
So there is no unique least squares solution beta
. lm
copes by fitting a reduced model to the first set of 10 linearly independent columns of X
, as indicated in your summary
output. (Whether it copes or throws an error depends on its argument singular.ok
. The default value is TRUE
.)
I find it curious that changing the response makes the problem go away, given that the rank of X
does not depend on y
. Perhaps you changed more than just the response without realizing?