I am trying to fit a linear regression mode to examine the impact of COVID on traffic volume. Here is my dataset that have 31 observations which I have normalised using min-max scaling.
structure(list(Date = c("Jul-19", "Aug-19", "Sep-19", "Oct-19",
"Nov-19", "Dec-19", "Jan-20", "Feb-20", "Mar-20", "Apr-20", "May-20",
"Jun-20", "Jul-20", "Aug-20", "Sep-20", "Oct-20", "Nov-20", "Dec-20",
"Jan-21", "Feb-21", "Mar-21", "Apr-21", "May-21", "Jun-21", "Jul-21",
"Aug-21", "Sep-21", "Oct-21", "Nov-21", "Dec-21", "Jan-22"),
Average.Weekday.Daily.Traffic..Veh.day. = c(0.888327194565519,
0.957155270369628, 0.895880165623604, 0.942066917846049,
1, 0.877361842545003, 0.703731456172012, 0.898082382615854,
0.607783746512863, 0, 0.317277553217312, 0.560830088053501,
0.706464878860476, 0.747561202987416, 0.737478848514911,
0.800118905645907, 0.857496156674019, 0.809655982748128,
0.708357519023144, 0.926517717996616, 0.895669090512526,
0.856018630896471, 0.925733222167108, 0.825782121234508,
0.798338838875814, 0.533903939716948, 0.819263418220707,
0.84537340946109, 0.792752384269276, 0.64987564158039, 0.365423786054267
), Case = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0.00543532109442455,
0.00036093929142663, 5.66179280669224e-05, 0, 0.000134467579158941,
0.000339707568401534, 0.000127390338150575, 0.000113235856133845,
0, 0.000169853784200767, 0.000106158615125479, 0.00106158615125479,
0.0015286840578069, 0, 0, 0.00110404959730499, 0.00312814052569746,
0, 0.000268935158317881, 0, 0.00153576129881527, 1), Regulation = c(0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0.666666666666667, 0.333333333333333,
0.333333333333333, 0.166666666666667, 0.166666666666667,
0.166666666666667, 0.166666666666667, 0.333333333333333,
0.166666666666667, 0.333333333333333, 0.166666666666667,
0.166666666666667, 0.333333333333333, 0.166666666666667,
0.166666666666667, 0.166666666666667, 0.166666666666667,
0, 0, 0)), row.names = c(NA, -31L), class = "data.frame")
My model and output:
m1 <- lm(Average.Weekday.Daily.Traffic..Veh.day. ~ Case Regulation, data = data)
summary(m1)
Call:
lm(formula = Average.Weekday.Daily.Traffic..Veh.day. ~ Case
Regulation, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.35325 -0.00991 0.00712 0.05537 0.25166
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.89096 0.03061 29.104 < 2e-16 ***
Case -0.52838 0.13368 -3.952 0.000477 ***
Regulation -0.53484 0.08077 -6.622 3.49e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1301 on 28 degrees of freedom
Multiple R-squared: 0.6548, Adjusted R-squared: 0.6301
F-statistic: 26.55 on 2 and 28 DF, p-value: 3.413e-07
Although the independent variables are significant but I found that errors are not normally distributed using Jarque-Bera test. I tried to fix it by doing Bootstrapping with the following codes:
set.seed(4321)
m2 <- Boot(m1, R = 100, method = "residual")
summary(m2)
But I received an error message even though I have included Case variable in my linear model: Error in eval(predvars, data, env) : object 'Case' not found
Could anyone please help explain for me why is it giving this error and how to fix it? Thank you so much for your help.
CodePudding user response:
Your problem is due to a name clash between the name of your data frame and the function data
, due to the way Boot
and boot
use non-standard evaluation. This is why it is best to avoid such names for your own objects. All you need to do is change data
to mydata
and the problem disappears.
library(car)
#> Loading required package: carData
mydata <- data
m1 <- lm(Average.Weekday.Daily.Traffic..Veh.day. ~ Case Regulation, data = mydata)
set.seed(4321)
m2 <- Boot(m1, R = 100, method = "residual")
#> Loading required namespace: boot
summary(m2)
#>
#> Number of bootstrap replications R = 100
#> original bootBias bootSE bootMed
#> (Intercept) 0.89096 0.00248002 0.038248 0.89308
#> Case -0.52838 0.00555811 0.187270 -0.51458
#> Regulation -0.53484 -0.00088829 0.099065 -0.53900
Created on 2022-04-03 by the reprex package (v2.0.1)