For the function: fmodel(model_object, ~ x_var color_var facet_var) Is it possible to visualize a model with additional variables?
With 3 variables there is no problem creating a graph:
install.packages('statisticalModeling')
library(statisticalModeling)
mod1 <- lm(wage ~ age sex sector, data = mosaicData::CPS85)
fmodel(mod1, ~ age sex sector)
With 5 variables, only the first 4 variables are included in the graph, is this the maximum allowable for this function? Is there a way I can graph all 5 variables?
mod1 <- lm(wage ~ age sex sector married educ, data = mosaicData::CPS85)
fmodel(mod1, ~ age sex sector married educ)
I have tried "forcing" the last variable to be displayed by defining specific values for the 5th explanatory variable and it results in a strange looking output.
mod1 <- lm(wage ~ age sex sector married educ, data = mosaicData::CPS85)
fmodel(mod1, ~ age sex sector married educ,
educ = c(10, 12, 16))
CodePudding user response:
There are three main problems here. The first is that the statisticalModeling
package is no longer on CRAN, and the archived version appears to have last been updated over 6 years ago. It may therefore have some compatability issues with ggplot, which has been updated extensively since then.
The second is that, like many ggplot wrappers, fmodel
aims to make it easier to create a ggplot, but what you gain in ease-of-use you lose in the ability to customize the plot or produce plots that were not part of the design spec. In these situations it is often best to wrangle the data yourself and plot with ggplot
directly.
The third (and most important) issue is that your plot already contains a lot of information, and adding yet another aesthetic mapping makes this worse.
All that said, if you really want to produce a plot that demonstrates the effects of 5 different predictor variables across all their possible values, you could do:
library(ggplot2)
mod1 <- lm(wage ~ age sex sector married educ, data = mosaicData::CPS85)
pred_df <- with(mosaicData::CPS85,
expand.grid(age = unique(age),
sector = unique(sector),
married = unique(married),
sex = unique(sex),
educ = unique(educ)))
pred_df$wage <- predict(mod1, pred_df)
ggplot(pred_df, aes(age, wage, linetype = sex, color = educ,
group = interaction(educ, sex)))
geom_line()
facet_grid(sector ~ married)
scale_color_viridis_c()
theme_bw(base_size = 16)
You can see why the authors of the statisticalModrling package may have held back from adding this facility. This is a pretty awful plot from a data visualization perspective.
CodePudding user response:
@allan-cameron in the comments requested an example with the marginaleffects
package. Unfortunately, the plotting functions in that package only support up to 3 predictors.
That said, it is pretty easy to replicate the plot in the other answer using the predictions()
function and ggplot2
. The main benefit relative to the predict()
solution is that we get confidence intervals, although this is admittedly not super useful in a plot that already has so much stuff going on.
library(ggplot2)
library(marginaleffects)
mod1 <- lm(
wage ~ age sex sector married educ,
data = mosaicData::CPS85)
pred_df <- predictions(mod1,
newdata = datagrid(age = unique,
sector = unique,
married = unique,
sex = unique,
educ = unique))
Look at predictions for reference:
pred_df
#
# Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % age sector married sex educ
# 9.3327387 0.9813 9.5105757 < 2.22e-16 7.409424 11.25605 43 const Married M 10
# 10.5702104 0.9822 10.7616499 < 2.22e-16 8.645112 12.49531 43 const Married M 12
# 13.0451539 1.0874 11.9967092 < 2.22e-16 10.913900 15.17641 43 const Married M 16
# 11.8076822 1.0188 11.5901116 < 2.22e-16 9.810926 13.80444 43 const Married M 14
# 8.0952670 1.0161 7.9666973 1.6297e-15 6.103672 10.08686 43 const Married M 8
# --- 25558 rows omitted. See ?avg_predictions and ?print.marginaleffects ---
# 4.5463896 1.3603 3.3422762 0.00083094 1.880314 7.21246 50 manag Single F 2
# 7.0213331 1.0641 6.5980786 4.1652e-11 4.935641 9.10703 50 manag Single F 6
# 5.1651255 1.2822 4.0282727 5.6188e-05 2.652024 7.67823 50 manag Single F 3
# 6.4025972 1.1336 5.6478524 1.6246e-08 4.180716 8.62448 50 manag Single F 5
# 5.7838614 1.2065 4.7938494 1.6361e-06 3.419131 8.14859 50 manag Single F 4
#
# Prediction type: response
# Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, wage, age, sector, married, sex, educ
Plot using ggplot2
:
ggplot(pred_df, aes(age, estimate, color = educ, group = educ))
geom_line()
facet_grid(sector ~ married sex)
scale_color_viridis_c()
theme_bw(base_size = 16)