multinomial logit-CodePudding

I'm stuck with running a multinomial logit regression in R. So, I'm given a dataset of individual choices in 50 different markets. In the dataset, there are three products (j = 1, 2, 3) and one outside option (j = 0). xj has three components, x(1) is the price and (x(2),x(3)) are other product characteristics. The data preview is attached for the reference. How should I run it? I'm new to R, and need to do this for applied econometrics using R. Can you help me with reshaping data and running multinomial regression?

> head(data)
  marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3 x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice
1           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      3
2           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
3           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      3
4           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
5           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
6           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2

CodePudding user response：

Running multinomial logit model in R can be done in several packages, including multinom package and mlogit package. The tutorial at UCLA website recommended by mhmtsrmn prefers multinom to mlogit

because it does not require the data to be reshaped (as the mlogit package does)

However, the data you provided have been in a shape compatible with the format required by mlogit package, so in case you want to use mlogit, you don't need reshaping anymore. Nevertheless, you do need to change the coding in the choice column as follows:

Choice 2 must be changed to prod2
Choice 3 must be changed to prod3, and so on.

This is necessary because in the other columns you use prod2, prod3, etc.

I tried to run mlogit function to your data sample, but it failed, most probably because this sample doesn't have enough variation in its values. So I change the values to random values and assigned the data frame to choice_dat name, like this:

choice_dat
 marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3
1           1        5        7        6        5        2        8        7
2           1        8        3        5        6        3        9        8
3           1        7       10        3        7        6        9        9
4           1        8        8        2        5        8        9        7
5           1        9        9       10        8        4        6        8
6           1        7        4        8        7       10       10        8
  x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice1
1       10       13        0        0        0   prod3
2        3       10        0        0        0   prod2
3        4       10        0        0        0   prod3
4        1       11        0        0        0   prod2
5        8       10        0        0        0   prod2
6        5       12        0        0        0   prod2

Then, I run mlogit to the data:

prod_dat <- dfidx(choice_dat, choice = "choice1", varying = c(2:13), sep = "_")
mod1<- mlogit(choice1 ~ x1   x2   x3|0, data = prod_dat)
summary(mod1)

Call:
mlogit(formula = choice1 ~ x1   x2   x3 | 0, data = prod_dat, 
    method = "nr")

Frequencies of alternatives:choice
  prod0   prod1   prod2   prod3 
0.00000 0.00000 0.66667 0.33333 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 9.53E-08 
gradient close to zero 

Coefficients :
   Estimate Std. Error z-value Pr(>|z|)
x1 -0.11412    0.38947 -0.2930   0.7695
x2  0.16461    0.17790  0.9253   0.3548
x3  0.26768    0.22651  1.1818   0.2373

Log-Likelihood: -5.8257

CodePudding user response：

Here is a link to multinomial logistics regression example in R using multinom from nnet package by UCLA. The formula format looks like the same as base R's lm function.

CodePudding user response：

Here's a multinom(...) example, using your data.

library(data.table)
library(nnet)
setDT(data)
##
#   first method
#
data[
  , c('x1', 'x2', 'x3'):=mget(sapply(1:3, function(x) sprintf('x%d_prod%d', x, choice)))
  , by=.(1:nrow(data))]
fit.1 <- multinom(choice ~ x1   x2   x3, data)
fit.1
## Call:
## multinom(formula = choice ~ x1   x2   x3, data = data)
## 
## Coefficients:
## (Intercept)          x1          x2          x3 
##   -3.420470   -6.949344  -12.363971    6.679612 
##
## Residual Deviance: 0.0001212278 
## AIC: 4.000121 
##
#   alternate method
#
data.melt <- melt(data, measure.vars = patterns('_prod'))
data.melt[, prod.id:=gsub('^. _prod(\\d )$', '\\1',variable)]
data.melt[, variable:=gsub('^(. )_. $', '\\1', variable)]
data.melt <- data.melt[choice==prod.id]
data.melt[, id:=seq(.N), by=.(variable, choice)]
mf <- dcast(data.melt, marketindex choice id~variable, value.var = 'value')
fit.2 <- multinom(choice ~ x1 x2 x3, mf)
fit.2
## Call:
## multinom(formula = choice ~ x1   x2   x3, data = mf)
##
## Coefficients:
## (Intercept)          x1          x2          x3 
##   -3.420470   -6.949344  -12.363971    6.679612 
##
## Residual Deviance: 0.0001212278 
## AIC: 4.000121