I'm stuck with running a multinomial logit regression in R. So, I'm given a dataset of individual choices in 50 different markets. In the dataset, there are three products (j = 1, 2, 3) and one outside option (j = 0). xj has three components, x(1) is the price and (x(2),x(3)) are other product characteristics. The data preview is attached for the reference. How should I run it? I'm new to R, and need to do this for applied econometrics using R. Can you help me with reshaping data and running multinomial regression?
> head(data)
marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3 x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice
1 1 7.459917 1 7.267866 6.67054 1 7.633743 8.444682 0 11.30016 0 0 0 3
2 1 7.459917 1 7.267866 6.67054 1 7.633743 8.444682 0 11.30016 0 0 0 2
3 1 7.459917 1 7.267866 6.67054 1 7.633743 8.444682 0 11.30016 0 0 0 3
4 1 7.459917 1 7.267866 6.67054 1 7.633743 8.444682 0 11.30016 0 0 0 2
5 1 7.459917 1 7.267866 6.67054 1 7.633743 8.444682 0 11.30016 0 0 0 2
6 1 7.459917 1 7.267866 6.67054 1 7.633743 8.444682 0 11.30016 0 0 0 2
CodePudding user response:
Running multinomial logit model in R can be done in several packages, including multinom
package and mlogit
package. The tutorial at UCLA website recommended by mhmtsrmn prefers multinom
to mlogit
because it does not require the data to be reshaped (as the mlogit package does)
However, the data you provided have been in a shape compatible with the format required by mlogit
package, so in case you want to use mlogit
, you don't need reshaping anymore. Nevertheless, you do need to change the coding in the choice
column as follows:
- Choice
2
must be changed toprod2
- Choice
3
must be changed toprod3
, and so on.
This is necessary because in the other columns you use prod2
, prod3
, etc.
I tried to run mlogit
function to your data sample, but it failed, most probably because this sample doesn't have enough variation in its values. So I change the values to random values and assigned the data frame to choice_dat
name, like this:
choice_dat
marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3
1 1 5 7 6 5 2 8 7
2 1 8 3 5 6 3 9 8
3 1 7 10 3 7 6 9 9
4 1 8 8 2 5 8 9 7
5 1 9 9 10 8 4 6 8
6 1 7 4 8 7 10 10 8
x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice1
1 10 13 0 0 0 prod3
2 3 10 0 0 0 prod2
3 4 10 0 0 0 prod3
4 1 11 0 0 0 prod2
5 8 10 0 0 0 prod2
6 5 12 0 0 0 prod2
Then, I run mlogit
to the data:
prod_dat <- dfidx(choice_dat, choice = "choice1", varying = c(2:13), sep = "_")
mod1<- mlogit(choice1 ~ x1 x2 x3|0, data = prod_dat)
summary(mod1)
Call:
mlogit(formula = choice1 ~ x1 x2 x3 | 0, data = prod_dat,
method = "nr")
Frequencies of alternatives:choice
prod0 prod1 prod2 prod3
0.00000 0.00000 0.66667 0.33333
nr method
5 iterations, 0h:0m:0s
g'(-H)^-1g = 9.53E-08
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
x1 -0.11412 0.38947 -0.2930 0.7695
x2 0.16461 0.17790 0.9253 0.3548
x3 0.26768 0.22651 1.1818 0.2373
Log-Likelihood: -5.8257
CodePudding user response:
Here is a link to multinomial logistics regression example in R using multinom
from nnet
package by UCLA. The formula format looks like the same as base R's lm
function.
CodePudding user response:
Here's a multinom(...)
example, using your data.
library(data.table)
library(nnet)
setDT(data)
##
# first method
#
data[
, c('x1', 'x2', 'x3'):=mget(sapply(1:3, function(x) sprintf('x%d_prod%d', x, choice)))
, by=.(1:nrow(data))]
fit.1 <- multinom(choice ~ x1 x2 x3, data)
fit.1
## Call:
## multinom(formula = choice ~ x1 x2 x3, data = data)
##
## Coefficients:
## (Intercept) x1 x2 x3
## -3.420470 -6.949344 -12.363971 6.679612
##
## Residual Deviance: 0.0001212278
## AIC: 4.000121
##
# alternate method
#
data.melt <- melt(data, measure.vars = patterns('_prod'))
data.melt[, prod.id:=gsub('^. _prod(\\d )$', '\\1',variable)]
data.melt[, variable:=gsub('^(. )_. $', '\\1', variable)]
data.melt <- data.melt[choice==prod.id]
data.melt[, id:=seq(.N), by=.(variable, choice)]
mf <- dcast(data.melt, marketindex choice id~variable, value.var = 'value')
fit.2 <- multinom(choice ~ x1 x2 x3, mf)
fit.2
## Call:
## multinom(formula = choice ~ x1 x2 x3, data = mf)
##
## Coefficients:
## (Intercept) x1 x2 x3
## -3.420470 -6.949344 -12.363971 6.679612
##
## Residual Deviance: 0.0001212278
## AIC: 4.000121