Home > other >  R fmodel() number of variables that can be used
R fmodel() number of variables that can be used

Time:02-01

For the function: fmodel(model_object, ~ x_var color_var facet_var) Is it possible to visualize a model with additional variables?

With 3 variables there is no problem creating a graph:

install.packages('statisticalModeling')
library(statisticalModeling)

mod1 <- lm(wage ~ age   sex   sector, data = mosaicData::CPS85)
fmodel(mod1, ~ age   sex   sector)

enter image description here

With 5 variables, only the first 4 variables are included in the graph, is this the maximum allowable for this function? Is there a way I can graph all 5 variables?

mod1 <- lm(wage ~ age   sex   sector   married   educ, data = mosaicData::CPS85)
fmodel(mod1, ~ age   sex   sector   married   educ)

enter image description here

I have tried "forcing" the last variable to be displayed by defining specific values for the 5th explanatory variable and it results in a strange looking output.

mod1 <- lm(wage ~ age   sex   sector   married   educ, data = mosaicData::CPS85)
fmodel(mod1, ~ age   sex   sector   married   educ,
       educ = c(10, 12, 16))

enter image description here

CodePudding user response:

There are three main problems here. The first is that the statisticalModeling package is no longer on CRAN, and the archived version appears to have last been updated over 6 years ago. It may therefore have some compatability issues with ggplot, which has been updated extensively since then.

The second is that, like many ggplot wrappers, fmodel aims to make it easier to create a ggplot, but what you gain in ease-of-use you lose in the ability to customize the plot or produce plots that were not part of the design spec. In these situations it is often best to wrangle the data yourself and plot with ggplot directly.

The third (and most important) issue is that your plot already contains a lot of information, and adding yet another aesthetic mapping makes this worse.

All that said, if you really want to produce a plot that demonstrates the effects of 5 different predictor variables across all their possible values, you could do:

library(ggplot2)

mod1 <- lm(wage ~ age   sex   sector   married   educ, data = mosaicData::CPS85)

pred_df <- with(mosaicData::CPS85, 
                expand.grid(age     = unique(age),
                            sector  = unique(sector),
                            married = unique(married),
                            sex     = unique(sex),
                            educ    = unique(educ)))

pred_df$wage <- predict(mod1, pred_df)

ggplot(pred_df, aes(age, wage, linetype = sex, color = educ, 
                    group = interaction(educ, sex)))  
  geom_line()  
  facet_grid(sector ~ married)  
  scale_color_viridis_c()  
  theme_bw(base_size = 16)

enter image description here

You can see why the authors of the statisticalModrling package may have held back from adding this facility. This is a pretty awful plot from a data visualization perspective.

CodePudding user response:

@allan-cameron in the comments requested an example with the marginaleffects package. Unfortunately, the plotting functions in that package only support up to 3 predictors.

That said, it is pretty easy to replicate the plot in the other answer using the predictions() function and ggplot2. The main benefit relative to the predict() solution is that we get confidence intervals, although this is admittedly not super useful in a plot that already has so much stuff going on.

library(ggplot2)
library(marginaleffects)

mod1 <- lm(
    wage ~ age   sex   sector   married   educ,
    data = mosaicData::CPS85)

pred_df <- predictions(mod1,
    newdata = datagrid(age = unique,
                       sector = unique,
                       married = unique,
                       sex = unique,
                       educ = unique))

Look at predictions for reference:

pred_df
# 
#    Estimate Std. Error          z   Pr(>|z|)     2.5 %   97.5 % age sector married sex educ
#   9.3327387     0.9813  9.5105757 < 2.22e-16  7.409424 11.25605  43  const Married   M   10
#  10.5702104     0.9822 10.7616499 < 2.22e-16  8.645112 12.49531  43  const Married   M   12
#  13.0451539     1.0874 11.9967092 < 2.22e-16 10.913900 15.17641  43  const Married   M   16
#  11.8076822     1.0188 11.5901116 < 2.22e-16  9.810926 13.80444  43  const Married   M   14
#   8.0952670     1.0161  7.9666973 1.6297e-15  6.103672 10.08686  43  const Married   M    8
# --- 25558 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
#   4.5463896     1.3603  3.3422762 0.00083094  1.880314  7.21246  50  manag  Single   F    2
#   7.0213331     1.0641  6.5980786 4.1652e-11  4.935641  9.10703  50  manag  Single   F    6
#   5.1651255     1.2822  4.0282727 5.6188e-05  2.652024  7.67823  50  manag  Single   F    3
#   6.4025972     1.1336  5.6478524 1.6246e-08  4.180716  8.62448  50  manag  Single   F    5
#   5.7838614     1.2065  4.7938494 1.6361e-06  3.419131  8.14859  50  manag  Single   F    4 
# 
# Prediction type:  response 
# Columns: rowid, type, estimate, std.error, statistic, p.value, conf.low, conf.high, wage, age, sector, married, sex, educ

Plot using ggplot2:

ggplot(pred_df, aes(age, estimate, color = educ, group = educ))  
  geom_line()  
  facet_grid(sector ~ married   sex)  
  scale_color_viridis_c()  
  theme_bw(base_size = 16)

  • Related