Home > Back-end >  Predict multiple dependent variables in R (linear model as example)
Predict multiple dependent variables in R (linear model as example)

Time:12-22

Let's use mtcars as the example dataset.

data <- mtcars

Now, I am creating a linear model where there are two dependent variables (mpg and disp). All the other variables/columns are independent variables. I build the linear function:

fit <- lm(mpg   disp ~ ., data=data)

I would like to predict both dependent variables (mpg and disp), so I run predict().

predict(fit, data)

However, the result only returns one value per row, instead of two values per row (the two dependent variables). This is the output:

          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
          165.60538           179.57702           159.75746           247.73607           344.79637           251.66598           348.80082           160.58774 
           Merc 230            Merc 280           Merc 280C          Merc 450SE          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
          148.81596           207.46454           200.85338           360.66297           331.60317           331.14515           429.61466           452.19838 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
          462.42356           126.53160            59.90496            93.95177           149.87657           332.59491           325.64415           380.86739 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
          375.64822           111.88026           159.05993           101.86053           369.53347           169.01770           309.00674           177.31316 
  • How can I get both dependent variable values using predict() or any other function?
  • How can I interpret the previous results?

Please, be aware that this model might not be realistic or significant. It is purely a technical programming task, so don't judge the utility of the model.

CodePudding user response:

Don't add the responses, cbind them.

fit <- lm(cbind(mpg, disp) ~ ., data=mtcars)
y <- predict(fit)
head(y)
#>                        mpg     disp
#> Mazda RX4         22.37587 143.2295
#> Mazda RX4 Wag     22.07853 157.4985
#> Datsun 710        26.58631 133.1712
#> Hornet 4 Drive    20.82285 226.9132
#> Hornet Sportabout 17.26052 327.5359
#> Valiant           20.46572 231.2003

Created on 2022-12-21 with reprex v2.0.2

Not asked but it also works with the model's residuals.

e <- resid(fit)
head(e)
#>                          mpg       disp
#> Mazda RX4         -1.3758673  16.770486
#> Mazda RX4 Wag     -1.0785279   2.501505
#> Datsun 710        -3.7863074 -25.171152
#> Hornet 4 Drive     0.5771451  31.086782
#> Hornet Sportabout  1.4394832  32.464148
#> Valiant           -2.3657210  -6.200261

Created on 2022-12-21 with reprex v2.0.2


This use of cbind is general purpose, for a logistic regression example with glm, see the menarche example here.

  • Related