I'm attempting to view a collection of diagnostics using the influence.measure()
function in the viewer above the console because to me it looks "cleaner". There's nothing special about the regression function, it is a basic multi-variate linear regression.
total_labour_hrs_lm = lm(data = Grocery_Retailer, formula = Total_Labour_hrs ~ Cases_Shipped Labour_Hrs_Cost Holiday)
> dput(head(Grocery_Retailer[, 1:4], 10))
structure(list(Total_Labour_hrs = c(4264, 4496, 4317, 4292, 4945,
4325, 4110, 4111, 4161, 4560), Cases_Shipped = c(305657, 328476,
317164, 366745, 265518, 301995, 269334, 267631, 296350, 277223
), Labour_Hrs_Cost = c(7.17, 6.2, 4.61, 7.02, 8.61, 6.88, 7.23,
6.27, 6.49, 6.37), Holiday = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0)), row.names = c("data.data.data.data.1",
"data.data.data.data.2", "data.data.data.data.3", "data.data.data.data.4",
"data.data.data.data.5", "data.data.data.data.6", "data.data.data.data.7",
"data.data.data.data.8", "data.data.data.data.9", "data.data.data.data.10"
), class = "data.frame")
After doing all of that I make the call to influence.measure()
:
View(as.data.frame(influence.measures(model = total_labour_hrs_lm)))
Error in View : cannot coerce class ‘"infl"’ to a data.frame
This is the error I get. I was doing a little reading to try and get an idea of what is happening and I'm conjecturing it has to do with not being able to coerce the last column in the influence.measure()
table into a structure that can be used in the View()
function. Surely there must be a way around this because all the last column uses are *
to identify the influential cases.
Still relatively new to programming so the ideas of classes, structures, etc is not formally understood yet and based off of what I've learned from doing all the work and practice I've done in R.
CodePudding user response:
You can manipulate the model results and the infl
object to get the data into the same format as you see when you run influence.measures(model = total_labour_hrs_lm)
. You can just turn the first item in the list into a dataframe, then find the influential rows of data and mutate
a new column with that information (i.e., *).
library(tidyverse)
library(tibble)
inflm <- influence.measures(total_labour_hrs_lm)
inflm.df <-
as.data.frame(inflm[["infmat"]]) %>%
tibble::remove_rownames() %>%
dplyr::mutate(inf = ifelse(row_number() %in% unname(which(
apply(inflm$is.inf, 1, any)
)), "*", ""))
Output
dfb.1_ dfb.Cs_S dfb.L_H_ dfb.Hldy dffit cov.r cook.d hat inf
1 0.002676016 -0.0005907955 -0.004099162 0.00348584 -0.006017538 2.61954809 1.086293e-05 0.2085001
2 -0.132020690 0.3163760133 -0.118513731 0.08050579 0.518084429 1.19096264 6.629460e-02 0.2000735
3 -1.011752656 -0.2260721643 1.803437294 -1.06556827 -1.978217429 4.97708055 9.797699e-01 0.7978985 *
4 0.858588355 -0.8998589831 -0.357487343 0.06500172 -1.035351317 4.73921419 2.939023e-01 0.6947583 *
5 0.000000000 0.0000000000 0.000000000 0.00000000 NaN NaN NaN 1.0000000
6 -0.027980403 -0.0014046128 0.059268845 -0.06382214 0.124813877 2.25009310 4.588414e-03 0.1437767
7 -0.085471708 0.3210337973 -0.278119353 0.34105089 -0.529878642 2.16749365 7.637825e-02 0.3532302
8 -0.454518871 0.4661922675 0.131276469 0.13744951 -0.611770744 1.44932953 9.444163e-02 0.2838286
9 -0.062332350 0.0610196593 -0.002635099 0.07859391 -0.266734659 1.56401553 1.928013e-02 0.1173206
10 0.974167713 -1.0201812141 -0.218184444 -0.41667266 1.544679713 0.03659811 2.467009e-01 0.2006134 *