Let's use mtcars
as the example dataset.
data <- mtcars
Now, I am creating a linear model where there are two dependent variables (mpg
and disp
). All the other variables/columns are independent variables. I build the linear function:
fit <- lm(mpg disp ~ ., data=data)
I would like to predict both dependent variables (mpg
and disp
), so I run predict()
.
predict(fit, data)
However, the result only returns one value per row, instead of two values per row (the two dependent variables). This is the output:
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant Duster 360 Merc 240D
165.60538 179.57702 159.75746 247.73607 344.79637 251.66598 348.80082 160.58774
Merc 230 Merc 280 Merc 280C Merc 450SE Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
148.81596 207.46454 200.85338 360.66297 331.60317 331.14515 429.61466 452.19838
Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
462.42356 126.53160 59.90496 93.95177 149.87657 332.59491 325.64415 380.86739
Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
375.64822 111.88026 159.05993 101.86053 369.53347 169.01770 309.00674 177.31316
- How can I get both dependent variable values using
predict()
or any other function? - How can I interpret the previous results?
Please, be aware that this model might not be realistic or significant. It is purely a technical programming task, so don't judge the utility of the model.
CodePudding user response:
Don't add the responses, cbind
them.
fit <- lm(cbind(mpg, disp) ~ ., data=mtcars)
y <- predict(fit)
head(y)
#> mpg disp
#> Mazda RX4 22.37587 143.2295
#> Mazda RX4 Wag 22.07853 157.4985
#> Datsun 710 26.58631 133.1712
#> Hornet 4 Drive 20.82285 226.9132
#> Hornet Sportabout 17.26052 327.5359
#> Valiant 20.46572 231.2003
Created on 2022-12-21 with reprex v2.0.2
Not asked but it also works with the model's residuals.
e <- resid(fit)
head(e)
#> mpg disp
#> Mazda RX4 -1.3758673 16.770486
#> Mazda RX4 Wag -1.0785279 2.501505
#> Datsun 710 -3.7863074 -25.171152
#> Hornet 4 Drive 0.5771451 31.086782
#> Hornet Sportabout 1.4394832 32.464148
#> Valiant -2.3657210 -6.200261
Created on 2022-12-21 with reprex v2.0.2
This use of cbind
is general purpose, for a logistic regression example with glm
, see the menarche example here.