Home > Blockchain >  order/number of variables in lm causing singularities?
order/number of variables in lm causing singularities?

Time:03-18

I was trying to run a linear model using lm() in R with 12 explanatory variables and 33 observations), but the coefficients for the last three variables are not defined because of singularities. When I switched the order of the variables, the same thing happens again, even though those variables (TotalPrec_11, TotalPrec_12, TotalPrec_10) were significant before. The coefficients were also different between two models.

ab <- lm(value ~ TotalPrec_12   TotalPrec_11   TotalPrec_10   TotalPrec_9   TotalPrec_8   TotalPrec_7   TotalPrec_6   TotalPrec_5   TotalPrec_4   TotalPrec_3   TotalPrec_2   TotalPrec_1, data = aa)

summary(ab)

#Coefficients: (3 not defined because of singularities)
#              Estimate Std. Error t value Pr(>|t|)  
#(Intercept)      64.34      30.80   2.089   0.0480 *
#TotalPrec_12  19811.97   11080.14   1.788   0.0869 .
#TotalPrec_11 -16159.45    7099.89  -2.276   0.0325 *
#TotalPrec_10 -16500.62   18813.96  -0.877   0.3895  
#TotalPrec_9   62662.08   51143.37   1.225   0.2329  
#TotalPrec_8     665.39   36411.95   0.018   0.9856  
#TotalPrec_7  -77203.59   51555.71  -1.497   0.1479  
#TotalPrec_6    4830.11   19503.52   0.248   0.8066  
#TotalPrec_5    6403.94   14902.77   0.430   0.6714  
#TotalPrec_4    -735.73    5023.83  -0.146   0.8848  
#TotalPrec_3         NA         NA      NA       NA  
#TotalPrec_2         NA         NA      NA       NA  
#TotalPrec_1         NA         NA      NA       NA  

The same data here with a different order of variables:

ab1 <- lm(value ~ TotalPrec_1   TotalPrec_2   TotalPrec_3   TotalPrec_9   TotalPrec_8   TotalPrec_7   TotalPrec_6   TotalPrec_5   TotalPrec_4   TotalPrec_11   TotalPrec_12   TotalPrec_10, data = aa)

summary(ab1)

#Coefficients: (3 not defined because of singularities)
#              Estimate Std. Error t value Pr(>|t|)  
#(Intercept)      63.72      54.44   1.171   0.2538  
#TotalPrec_1   19611.54   19366.33   1.013   0.3218  
#TotalPrec_2  -14791.44    7847.87  -1.885   0.0722 .
#TotalPrec_3    6766.60    3144.68   2.152   0.0422 *
#TotalPrec_9   28677.62   53530.82   0.536   0.5973  
#TotalPrec_8  -23207.34   65965.12  -0.352   0.7282  
#TotalPrec_7  -26628.10   55839.25  -0.477   0.6380  
#TotalPrec_6  -28694.23   13796.80  -2.080   0.0489 *
#TotalPrec_5   46982.35   17941.89   2.619   0.0154 *
#TotalPrec_4  -26393.70   17656.70  -1.495   0.1486  
#TotalPrec_11        NA         NA      NA       NA  
#TotalPrec_12        NA         NA      NA       NA  
#TotalPrec_10        NA         NA      NA       NA  

Several posts online suggest that it might be a multicollinearity problems. I ran the cor() function to check for collinearity, and nothing came out to be perfectly correlated.

I used the same set of these 12 variables with other response variables, and there was no problem with singularities. So I'm not sure what happens here and what I need to do differently to figure this out.

edit

here is my data

> dput(aa)
structure(list(value = c(93, 95, 88, 90, 90, 80, 100, 80, 96, 
100, 100, 100, 80, 94, 88, 76, 90, 0, 93, 100, 88, 90, 95, 71, 
92, 93, 92, 100, 85, 90, 100, 100, 100), TotalPrec_1 = c(0.00319885835051536, 
0.00319885835051536, 0.00319885835051536, 0.00717973057180643, 
0.00717973057180643, 0.00717973057180643, 0.00464357063174247, 
0.00464357063174247, 0.00464357063174247, 0.00598198547959327, 
0.00598198547959327, 0.00598198547959327, 0.00380058260634541, 
0.00380058260634541, 0.00380058260634541, 0.00380058260634541, 
0.00364887388423085, 0.00364887388423085, 0.00364887388423085, 
0.00475014140829443, 0.00475014140829443, 0.00475014140829443, 
0.00475014140829443, 0.00499139120802283, 0.00499139120802283, 
0.00499139120802283, 0.00499139120802283, 0.00490436097607016, 
0.00490436097607016, 0.00490436097607016, 0.00623255362734198, 
0.00623255362734198, 0.00623255362734198), TotalPrec_2 = c(0.00387580785900354, 
0.00387580785900354, 0.00387580785900354, 0.00625309534370899, 
0.00625309534370899, 0.00625309534370899, 0.00298969540745019, 
0.00298969540745019, 0.00298969540745019, 0.00558579061180353, 
0.00558579061180353, 0.00558579061180353, 0.00370361795648932, 
0.00370361795648932, 0.00370361795648932, 0.00370361795648932, 
0.00335893919691443, 0.00335893919691443, 0.00335893919691443, 
0.00621500937268137, 0.00621500937268137, 0.00621500937268137, 
0.00621500937268137, 0.00234323320910334, 0.00234323320910334, 
0.00234323320910334, 0.00234323320910334, 0.00644989637658, 0.00644989637658, 
0.00644989637658, 0.00476496992632746, 0.00476496992632746, 0.00476496992632746
), TotalPrec_3 = c(0.00418250449001789, 0.00418250449001789, 
0.00418250449001789, 0.00702223135158419, 0.00702223135158419, 
0.00702223135158419, 0.00427648611366748, 0.00427648611366748, 
0.00427648611366748, 0.00562589056789875, 0.00562589056789875, 
0.00562589056789875, 0.0037367227487266, 0.0037367227487266, 
0.0037367227487266, 0.0037367227487266, 0.00477339653298258, 
0.00477339653298258, 0.00477339653298258, 0.0124167986214161, 
0.0124167986214161, 0.0124167986214161, 0.0124167986214161, 0.010678518563509, 
0.010678518563509, 0.010678518563509, 0.010678518563509, 0.0139585845172405, 
0.0139585845172405, 0.0139585845172405, 0.00741709442809224, 
0.00741709442809224, 0.00741709442809224), TotalPrec_4 = c(0.00659881485626101, 
0.00659881485626101, 0.00659881485626101, 0.00347008113749325, 
0.00347008113749325, 0.00347008113749325, 0.00720167113468051, 
0.00720167113468051, 0.00720167113468051, 0.00704727275297045, 
0.00704727275297045, 0.00704727275297045, 0.00856677815318107, 
0.00856677815318107, 0.00856677815318107, 0.00856677815318107, 
0.00867980346083641, 0.00867980346083641, 0.00867980346083641, 
0.00614343490451574, 0.00614343490451574, 0.00614343490451574, 
0.00614343490451574, 0.00704662408679723, 0.00704662408679723, 
0.00704662408679723, 0.00704662408679723, 0.00495137926191091, 
0.00495137926191091, 0.00495137926191091, 0.00796654727309942, 
0.00796654727309942, 0.00796654727309942), TotalPrec_5 = c(0.00515584181994199, 
0.00515584181994199, 0.00515584181994199, 0.000977653078734875, 
0.000977653078734875, 0.000977653078734875, 0.00485571753233671, 
0.00485571753233671, 0.00485571753233671, 0.00477610062807798, 
0.00477610062807798, 0.00477610062807798, 0.00664602871984243, 
0.00664602871984243, 0.00664602871984243, 0.00664602871984243, 
0.00533714797347784, 0.00533714797347784, 0.00533714797347784, 
0.00265633105300366, 0.00265633105300366, 0.00265633105300366, 
0.00265633105300366, 0.00200922577641904, 0.00200922577641904, 
0.00200922577641904, 0.00200922577641904, 0.00172789173666387, 
0.00172789173666387, 0.00172789173666387, 0.00347296684049069, 
0.00347296684049069, 0.00347296684049069), TotalPrec_6 = c(0.00170362275093793, 
0.00170362275093793, 0.00170362275093793, 0.000670029199682176, 
0.000670029199682176, 0.000670029199682176, 0.0018315939232707, 
0.0018315939232707, 0.0018315939232707, 0.00138648133724927, 
0.00138648133724927, 0.00138648133724927, 0.00329410820268094, 
0.00329410820268094, 0.00329410820268094, 0.00329410820268094, 
0.00210500298999249, 0.00210500298999249, 0.00210500298999249, 
0.000628655252512544, 0.000628655252512544, 0.000628655252512544, 
0.000628655252512544, 0.000631613133009523, 0.000631613133009523, 
0.000631613133009523, 0.000631613133009523, 0.000616533157881349, 
0.000616533157881349, 0.000616533157881349, 0.000599739549215883, 
0.000599739549215883, 0.000599739549215883), TotalPrec_7 = c(0.00124496815260499, 
0.00124496815260499, 0.00124496815260499, 0.000289129035081714, 
0.000289129035081714, 0.000289129035081714, 0.00089572963770479, 
0.00089572963770479, 0.00089572963770479, 0.00187503395136445, 
0.00187503395136445, 0.00187503395136445, 0.00070394336944446, 
0.00070394336944446, 0.00070394336944446, 0.00070394336944446, 
0.000733022985514253, 0.000733022985514253, 0.000733022985514253, 
4.50894685855019e-06, 4.50894685855019e-06, 4.50894685855019e-06, 
4.50894685855019e-06, 3.02730550174601e-05, 3.02730550174601e-05, 
3.02730550174601e-05, 3.02730550174601e-05, 3.71173496205301e-06, 
3.71173496205301e-06, 3.71173496205301e-06, 4.58224167232402e-05, 
4.58224167232402e-05, 4.58224167232402e-05), TotalPrec_8 = c(0.000394100265111774, 
0.000394100265111774, 0.000394100265111774, 0.000930351321585476, 
0.000930351321585476, 0.000930351321585476, 0.000679628865327686, 
0.000679628865327686, 0.000679628865327686, 0.000997507828287781, 
0.000997507828287781, 0.000997507828287781, 1.77486290340312e-05, 
1.77486290340312e-05, 1.77486290340312e-05, 1.77486290340312e-05, 
1.63553704624064e-05, 1.63553704624064e-05, 1.63553704624064e-05, 
4.31556363764685e-05, 4.31556363764685e-05, 4.31556363764685e-05, 
4.31556363764685e-05, 8.14739760244265e-05, 8.14739760244265e-05, 
8.14739760244265e-05, 8.14739760244265e-05, 4.07490988436621e-05, 
4.07490988436621e-05, 4.07490988436621e-05, 0.000140139847644605, 
0.000140139847644605, 0.000140139847644605), TotalPrec_9 = c(0.000616681878454983, 
0.000616681878454983, 0.000616681878454983, 0.000742240983527154, 
0.000742240983527154, 0.000742240983527154, 0.000230846126214601, 
0.000230846126214601, 0.000230846126214601, 0.00132466584909707, 
0.00132466584909707, 0.00132466584909707, 0.000114383190521039, 
0.000114383190521039, 0.000114383190521039, 0.000114383190521039, 
6.07241054240149e-05, 6.07241054240149e-05, 6.07241054240149e-05, 
2.74324702331796e-05, 2.74324702331796e-05, 2.74324702331796e-05, 
2.74324702331796e-05, 6.96572624292457e-06, 6.96572624292457e-06, 
6.96572624292457e-06, 6.96572624292457e-06, 3.32364725181833e-05, 
3.32364725181833e-05, 3.32364725181833e-05, 0.000108777909190394, 
0.000108777909190394, 0.000108777909190394), TotalPrec_10 = c(0.00040393992094323, 
0.00040393992094323, 0.00040393992094323, 0.00166831514798104, 
0.00166831514798104, 0.00166831514798104, 0.000324568885844201, 
0.000324568885844201, 0.000324568885844201, 0.000868275004904717, 
0.000868275004904717, 0.000868275004904717, 1.25834640130051e-05, 
1.25834640130051e-05, 1.25834640130051e-05, 1.25834640130051e-05, 
7.2861012085923e-06, 7.2861012085923e-06, 7.2861012085923e-06, 
0.000946254527661949, 0.000946254527661949, 0.000946254527661949, 
0.000946254527661949, 0.000476793473353609, 0.000476793473353609, 
0.000476793473353609, 0.000476793473353609, 0.00102826312649995, 
0.00102826312649995, 0.00102826312649995, 0.00266417209059, 0.00266417209059, 
0.00266417209059), TotalPrec_11 = c(0.00124716362915933, 0.00124716362915933, 
0.00124716362915933, 0.00470362277701497, 0.00470362277701497, 
0.00470362277701497, 0.0017967780586332, 0.0017967780586332, 
0.0017967780586332, 0.000694554066285491, 0.000694554066285491, 
0.000694554066285491, 0.000485763972392306, 0.000485763972392306, 
0.000485763972392306, 0.000485763972392306, 0.00074231723556295, 
0.00074231723556295, 0.00074231723556295, 0.000763822405133396, 
0.000763822405133396, 0.000763822405133396, 0.000763822405133396, 
0.00114128366112709, 0.00114128366112709, 0.00114128366112709, 
0.00114128366112709, 0.000856105296406895, 0.000856105296406895, 
0.000856105296406895, 0.00255026295781135, 0.00255026295781135, 
0.00255026295781135), TotalPrec_12 = c(0.00380058260634541, 0.00380058260634541, 
0.00380058260634541, 0.00475014140829443, 0.00475014140829443, 
0.00475014140829443, 0.00412079365924, 0.00412079365924, 0.00412079365924, 
0.00455283792689442, 0.00455283792689442, 0.00455283792689442, 
0.00117174908518791, 0.00117174908518791, 0.00117174908518791, 
0.00117174908518791, 0.00119069591164588, 0.00119069591164588, 
0.00119069591164588, 0.00201585865579545, 0.00201585865579545, 
0.00201585865579545, 0.00201585865579545, 0.00202310062013566, 
0.00202310062013566, 0.00202310062013566, 0.00202310062013566, 
0.00231692171655595, 0.00231692171655595, 0.00231692171655595, 
0.00495567917823791, 0.00495567917823791, 0.00495567917823791
)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))

CodePudding user response:

When you have multiple predictors, singularity doesn’t necessarily mean that two variables are perfectly correlated. It means that at least one of your variables can be perfectly predicted by some combination of the other variables, even if none of those variables is a perfect predictor on its own. When you have many predictors relative to few observations, as you do, the odds of this happening increase. So you will probably need to simplify your model.

CodePudding user response:

You are trying to estimate a linear model y = X %*% beta epsilon given X and y. The model matrix X has 33 rows and 13 columns, one for the intercept and one for each numeric variable:

X <- model.matrix(ab)
dim(X)
## [1] 33 13
colnames(X)
##  [1] "(Intercept)"  "TotalPrec_12" "TotalPrec_11" "TotalPrec_10" "TotalPrec_9" 
##  [6] "TotalPrec_8"  "TotalPrec_7"  "TotalPrec_6"  "TotalPrec_5"  "TotalPrec_4" 
## [11] "TotalPrec_3"  "TotalPrec_2"  "TotalPrec_1" 

But X has rank 10, not 13:

qr(X)$rank
## [1] 10

So there is no unique least squares solution beta. lm copes by fitting a reduced model to the first set of 10 linearly independent columns of X, as indicated in your summary output. (Whether it copes or throws an error depends on its argument singular.ok. The default value is TRUE.)

I find it curious that changing the response makes the problem go away, given that the rank of X does not depend on y. Perhaps you changed more than just the response without realizing?

  • Related