This has been blowing up my head all afternoon I simply cannot figure out why I cannot run mlogit on this simple data set
small snip
race,horseno,place,win,of,ppf,orf,df,jf,tf,wf,af
1,1,4,0,1,0.7,1,0.33,0.13,0.09,0.72,1
1,2,2,0,0.45,0.78,0.99,0.5,0.22,0.2,0.73,0.98
1,3,1,1,0.42,1,0.99,1,0.18,0.1,0.73,0.76
1,4,3,0,0.19,0.27,0.99,0.17,0.22,0.12,0.73,0.47
2,1,2,0,1,1,1,1,0.31,0.16,0.61,0.81
2,2,4,0,0.24,0.88,1,1,0.09,0.07,0.61,0.92
2,3,1,1,0.16,0.03,1,1,0.57,0.29,0.61,0.98
2,4,5,0,0.21,0.47,1,1,0.25,0.05,0.61,0.92
2,5,8,0,0.01,0.3,1,1,0.19,0,0.64,0.92
2,6,7,0,0.01,0.21,1,1,0.2,0,0.61,1
2,7,3,0,0.1,0.34,1,1,0.16,0.04,0.58,0.79
2,8,11,0,0.06,0.03,1,1,0.21,0.16,0.61,0.92
2,9,10,0,0.03,0.03,1,1,0.19,0.16,0.61,0.92
2,10,9,0,0.01,0.29,1,1,0.09,0.05,0.61,0.77
2,11,6,0,0.01,0.25,1,1,0.09,0.05,0.61,0.77
full file - https://pastebin.com/dhiA4XXn can be found here, this exact csv doesn't work when I run and this is the error.
> x <- mlogit.data(data, choice = "win", shape = "long", id.var = "race", alt.var = "horseno")
Error in `$<-.data.frame`(`*tmp*`, "id1", value = c(1L, 1L, 1L, 1L, 1L, :
replacement has 22 rows, data has 26
Honestly if anyone can save me I'd appreciate it
CodePudding user response:
I don't fully understand the choice situations from your data. Nevertheless, I tried to compare your data with TravelMode
data in this tutorial. It seems that race
in your data is similar to individual
in TravelMode
data, which is the name of the variable that contains the information about the choice situations. So, I assume that race
can be assigned to chid.var
. Here is my trial:
dat = read.table(text = "race,horseno,place,win,of,ppf,orf,df,jf,tf,wf,af
1,1,4,0,1,0.7,1,0.33,0.13,0.09,0.72,1
1,2,2,0,0.45,0.78,0.99,0.5,0.22,0.2,0.73,0.98
1,3,1,1,0.42,1,0.99,1,0.18,0.1,0.73,0.76
1,4,3,0,0.19,0.27,0.99,0.17,0.22,0.12,0.73,0.47
2,1,2,0,1,1,1,1,0.31,0.16,0.61,0.81
2,2,4,0,0.24,0.88,1,1,0.09,0.07,0.61,0.92
2,3,1,1,0.16,0.03,1,1,0.57,0.29,0.61,0.98
2,4,5,0,0.21,0.47,1,1,0.25,0.05,0.61,0.92
2,5,8,0,0.01,0.3,1,1,0.19,0,0.64,0.92
2,6,7,0,0.01,0.21,1,1,0.2,0,0.61,1
2,7,3,0,0.1,0.34,1,1,0.16,0.04,0.58,0.79
2,8,11,0,0.06,0.03,1,1,0.21,0.16,0.61,0.92
2,9,10,0,0.03,0.03,1,1,0.19,0.16,0.61,0.92
2,10,9,0,0.01,0.29,1,1,0.09,0.05,0.61,0.77
2,11,6,0,0.01,0.25,1,1,0.09,0.05,0.61,0.77", header = TRUE, sep = ",")
x <- mlogit.data(dat, choice = "win", shape = "long",
chid.var = "race", alt.var = "horseno")
x
# ~~~~~~~
# first 10 observations out of 15
# ~~~~~~~
# race horseno place win of ppf orf df jf tf wf af idx
# 1 1 1 4 FALSE 1.00 0.70 1.00 0.33 0.13 0.09 0.72 1.00 1:1
# 2 1 2 2 FALSE 0.45 0.78 0.99 0.50 0.22 0.20 0.73 0.98 1:2
# 3 1 3 1 TRUE 0.42 1.00 0.99 1.00 0.18 0.10 0.73 0.76 1:3
# 4 1 4 3 FALSE 0.19 0.27 0.99 0.17 0.22 0.12 0.73 0.47 1:4
# 5 2 1 2 FALSE 1.00 1.00 1.00 1.00 0.31 0.16 0.61 0.81 2:1
# 6 2 2 4 FALSE 0.24 0.88 1.00 1.00 0.09 0.07 0.61 0.92 2:2
# 7 2 3 1 TRUE 0.16 0.03 1.00 1.00 0.57 0.29 0.61 0.98 2:3
# 8 2 4 5 FALSE 0.21 0.47 1.00 1.00 0.25 0.05 0.61 0.92 2:4
# 9 2 5 8 FALSE 0.01 0.30 1.00 1.00 0.19 0.00 0.64 0.92 2:5
# 10 2 6 7 FALSE 0.01 0.21 1.00 1.00 0.20 0.00 0.61 1.00 2:6
#
# ~~~ indexes ~~~~
# chid alt
# 1 1 1
# 2 1 2
# 3 1 3
# 4 1 4
# 5 2 1
# 6 2 2
# 7 2 3
# 8 2 4
# 9 2 5
# 10 2 6
# indexes: 1, 2
CodePudding user response:
It seems to be due to not all horses being represented in the first race (there are only 4 horses in the first race). If you remove the first four entries, it works:
mlogit.data(data[-(1:4),], choice = "win", shape = "long",
id.var = "race", alt.var = "horseno")
~~~~~~~
first 10 observations out of 22
~~~~~~~
race horseno place win of ppf orf df jf tf wf af id1 idx
1 2 1 2 FALSE 1.00 1.00 1 1 0.31 0.16 0.61 0.81 1 1:1
2 2 2 4 FALSE 0.24 0.88 1 1 0.09 0.07 0.61 0.92 1 1:2
3 2 3 1 TRUE 0.16 0.03 1 1 0.57 0.29 0.61 0.98 1 1:3
4 2 4 5 FALSE 0.21 0.47 1 1 0.25 0.05 0.61 0.92 1 1:4
5 2 5 8 FALSE 0.01 0.30 1 1 0.19 0.00 0.64 0.92 1 1:5
6 2 6 7 FALSE 0.01 0.21 1 1 0.20 0.00 0.61 1.00 1 1:6
7 2 7 3 FALSE 0.10 0.34 1 1 0.16 0.04 0.58 0.79 1 1:7
8 2 8 11 FALSE 0.06 0.03 1 1 0.21 0.16 0.61 0.92 1 1:8
9 2 9 10 FALSE 0.03 0.03 1 1 0.19 0.16 0.61 0.92 1 1:9
10 2 10 9 FALSE 0.01 0.29 1 1 0.09 0.05 0.61 0.77 1 1:10
~~~ indexes ~~~~
chid alt
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 7
8 1 8
9 1 9
10 1 10
indexes: 1, 2
Note that this function is now deprecated. You should use dfidx::dfidx
instead.