Why will this csv file not load in R mlogit?-CodePudding

This has been blowing up my head all afternoon I simply cannot figure out why I cannot run mlogit on this simple data set

small snip

race,horseno,place,win,of,ppf,orf,df,jf,tf,wf,af
1,1,4,0,1,0.7,1,0.33,0.13,0.09,0.72,1
1,2,2,0,0.45,0.78,0.99,0.5,0.22,0.2,0.73,0.98
1,3,1,1,0.42,1,0.99,1,0.18,0.1,0.73,0.76
1,4,3,0,0.19,0.27,0.99,0.17,0.22,0.12,0.73,0.47
2,1,2,0,1,1,1,1,0.31,0.16,0.61,0.81
2,2,4,0,0.24,0.88,1,1,0.09,0.07,0.61,0.92
2,3,1,1,0.16,0.03,1,1,0.57,0.29,0.61,0.98
2,4,5,0,0.21,0.47,1,1,0.25,0.05,0.61,0.92
2,5,8,0,0.01,0.3,1,1,0.19,0,0.64,0.92
2,6,7,0,0.01,0.21,1,1,0.2,0,0.61,1
2,7,3,0,0.1,0.34,1,1,0.16,0.04,0.58,0.79
2,8,11,0,0.06,0.03,1,1,0.21,0.16,0.61,0.92
2,9,10,0,0.03,0.03,1,1,0.19,0.16,0.61,0.92
2,10,9,0,0.01,0.29,1,1,0.09,0.05,0.61,0.77
2,11,6,0,0.01,0.25,1,1,0.09,0.05,0.61,0.77

full file - https://pastebin.com/dhiA4XXn can be found here, this exact csv doesn't work when I run and this is the error.

> x <- mlogit.data(data, choice = "win", shape = "long", id.var = "race", alt.var = "horseno")
Error in `$<-.data.frame`(`*tmp*`, "id1", value = c(1L, 1L, 1L, 1L, 1L,  : 
  replacement has 22 rows, data has 26

Honestly if anyone can save me I'd appreciate it

CodePudding user response：

I don't fully understand the choice situations from your data. Nevertheless, I tried to compare your data with TravelMode data in this tutorial. It seems that race in your data is similar to individual in TravelMode data, which is the name of the variable that contains the information about the choice situations. So, I assume that race can be assigned to chid.var. Here is my trial:

dat = read.table(text = "race,horseno,place,win,of,ppf,orf,df,jf,tf,wf,af
1,1,4,0,1,0.7,1,0.33,0.13,0.09,0.72,1
1,2,2,0,0.45,0.78,0.99,0.5,0.22,0.2,0.73,0.98
1,3,1,1,0.42,1,0.99,1,0.18,0.1,0.73,0.76
1,4,3,0,0.19,0.27,0.99,0.17,0.22,0.12,0.73,0.47
2,1,2,0,1,1,1,1,0.31,0.16,0.61,0.81
2,2,4,0,0.24,0.88,1,1,0.09,0.07,0.61,0.92
2,3,1,1,0.16,0.03,1,1,0.57,0.29,0.61,0.98
2,4,5,0,0.21,0.47,1,1,0.25,0.05,0.61,0.92
2,5,8,0,0.01,0.3,1,1,0.19,0,0.64,0.92
2,6,7,0,0.01,0.21,1,1,0.2,0,0.61,1
2,7,3,0,0.1,0.34,1,1,0.16,0.04,0.58,0.79
2,8,11,0,0.06,0.03,1,1,0.21,0.16,0.61,0.92
2,9,10,0,0.03,0.03,1,1,0.19,0.16,0.61,0.92
2,10,9,0,0.01,0.29,1,1,0.09,0.05,0.61,0.77
2,11,6,0,0.01,0.25,1,1,0.09,0.05,0.61,0.77", header = TRUE, sep = ",")

x <- mlogit.data(dat, choice = "win", shape = "long", 
                 chid.var = "race", alt.var = "horseno")
x
# ~~~~~~~
#   first 10 observations out of 15 
# ~~~~~~~
#   race horseno place   win   of  ppf  orf   df   jf   tf   wf   af idx
# 1     1       1     4 FALSE 1.00 0.70 1.00 0.33 0.13 0.09 0.72 1.00 1:1
# 2     1       2     2 FALSE 0.45 0.78 0.99 0.50 0.22 0.20 0.73 0.98 1:2
# 3     1       3     1  TRUE 0.42 1.00 0.99 1.00 0.18 0.10 0.73 0.76 1:3
# 4     1       4     3 FALSE 0.19 0.27 0.99 0.17 0.22 0.12 0.73 0.47 1:4
# 5     2       1     2 FALSE 1.00 1.00 1.00 1.00 0.31 0.16 0.61 0.81 2:1
# 6     2       2     4 FALSE 0.24 0.88 1.00 1.00 0.09 0.07 0.61 0.92 2:2
# 7     2       3     1  TRUE 0.16 0.03 1.00 1.00 0.57 0.29 0.61 0.98 2:3
# 8     2       4     5 FALSE 0.21 0.47 1.00 1.00 0.25 0.05 0.61 0.92 2:4
# 9     2       5     8 FALSE 0.01 0.30 1.00 1.00 0.19 0.00 0.64 0.92 2:5
# 10    2       6     7 FALSE 0.01 0.21 1.00 1.00 0.20 0.00 0.61 1.00 2:6
# 
# ~~~ indexes ~~~~
#   chid alt
# 1     1   1
# 2     1   2
# 3     1   3
# 4     1   4
# 5     2   1
# 6     2   2
# 7     2   3
# 8     2   4
# 9     2   5
# 10    2   6
# indexes:  1, 2

CodePudding user response：

It seems to be due to not all horses being represented in the first race (there are only 4 horses in the first race). If you remove the first four entries, it works:

mlogit.data(data[-(1:4),], choice = "win", shape = "long", 
            id.var = "race", alt.var = "horseno")
~~~~~~~
 first 10 observations out of 22 
~~~~~~~
   race horseno place   win   of  ppf orf df   jf   tf   wf   af id1  idx
1     2       1     2 FALSE 1.00 1.00   1  1 0.31 0.16 0.61 0.81   1  1:1
2     2       2     4 FALSE 0.24 0.88   1  1 0.09 0.07 0.61 0.92   1  1:2
3     2       3     1  TRUE 0.16 0.03   1  1 0.57 0.29 0.61 0.98   1  1:3
4     2       4     5 FALSE 0.21 0.47   1  1 0.25 0.05 0.61 0.92   1  1:4
5     2       5     8 FALSE 0.01 0.30   1  1 0.19 0.00 0.64 0.92   1  1:5
6     2       6     7 FALSE 0.01 0.21   1  1 0.20 0.00 0.61 1.00   1  1:6
7     2       7     3 FALSE 0.10 0.34   1  1 0.16 0.04 0.58 0.79   1  1:7
8     2       8    11 FALSE 0.06 0.03   1  1 0.21 0.16 0.61 0.92   1  1:8
9     2       9    10 FALSE 0.03 0.03   1  1 0.19 0.16 0.61 0.92   1  1:9
10    2      10     9 FALSE 0.01 0.29   1  1 0.09 0.05 0.61 0.77   1 1:10

~~~ indexes ~~~~
   chid alt
1     1   1
2     1   2
3     1   3
4     1   4
5     1   5
6     1   6
7     1   7
8     1   8
9     1   9
10    1  10
indexes:  1, 2

Note that this function is now deprecated. You should use dfidx::dfidx instead.