We have data on the mode choice of 453 commuters. Four modes are available: (1) bus, (2) car alone, (3) carpool, and (4) rail. We have data for each commuter on the cost and time on each mode and the chosen mode.
mlogit
estimates the multinomial probit model if the
probit
argument is TRUE
using the GHK
procedure. This program estimates the full covariance matrix subject to
normalization.
More precisely, utility differences are computed respective to the reference level of the response (by default the bus alternative) and the 3 × 3 matrix of covariance is estimated. As the scale of utility is unobserved, the first element of the matrix is further set to 1. The Choleski factor of the covariance is :
$$ L = \left( \begin{array}{ccc} 1 & 0 & 0 \\ \theta_{32} & \theta_{33} & 0 \\ \theta_{42} & \theta_{43} & \theta_{44} \end{array} \right) $$
that is, five covariance parameters are estimated. The covariance matrix of the utility differences is then Ω = LL⊤. By working in a Choleski factor that has this form, the normalization constraints are automatically imposed and the covariance matrix is guaranteed to be positive semi-definite (with the covariance matrix for any error differences being positive-definite).
library("mlogit")
data("Mode", package="mlogit")
Mo <- mlogit.data(Mode, choice = 'choice', shape = 'wide',
varying = c(2:9))
##
## Call:
## mlogit(formula = choice ~ cost + time, data = Mo, start = strt,
## probit = TRUE, R = 100, seed = 20)
##
## Frequencies of alternatives:
## bus car carpool rail
## 0.17881 0.48124 0.07064 0.26932
##
## bfgs method
## 1 iterations, 0h:0m:2s
## g'(-H)^-1g = 3.14E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## car:(intercept) 1.8308557 0.2506437 7.3046 2.780e-13 ***
## carpool:(intercept) -1.2816822 0.5677907 -2.2573 0.0239884 *
## rail:(intercept) 0.3093559 0.1151693 2.6861 0.0072292 **
## cost -0.4134374 0.0731593 -5.6512 1.593e-08 ***
## time -0.0466545 0.0068263 -6.8346 8.225e-12 ***
## car.carpool 0.2599571 0.3850390 0.6751 0.4995837
## car.rail 0.7364831 0.2145726 3.4323 0.0005984 ***
## carpool.carpool 1.3078997 0.3916806 3.3392 0.0008402 ***
## carpool.rail -0.7981622 0.3463812 -2.3043 0.0212065 *
## rail.rail 0.4301445 0.4874502 0.8824 0.3775401
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -347.92
## McFadden R^2: 0.36012
## Likelihood ratio test : chisq = 391.62 (p.value = < 2.22e-16)
The estimated Choleski factor L1 is :
Multiplying L1 by its transpose gives Ω1 :
## [,1] [,2] [,3]
## [1,] 1.0000000 0.2599571 0.7364831
## [2,] 0.2599571 1.7781792 -0.8524620
## [3,] 0.7364831 -0.8524620 1.3644946
With iid errors, Ω1 would be :
$$ \left( \begin{array}{ccc} 1 & 0.5 & 0.5 \\ 0.5 & 1 & 0.5 \\ 0.5 & 0.5 & 1 \\ \end{array} \right) $$
I find it hard to tell anything from the estimated covariance terms.
I agree: it is hard – if not impossible – to meaningfully interpret the covariance parameters when all free parameters are estimated. However, the substitutiuon patterns that the estimates imply can be observed by forecasting with the model; we do this in exercise 4 below. Also, when structure is placed on the covariance matrix, the estimates are usually easier to interpret; this is explored in exercise 6.]
## car:(intercept) carpool:(intercept) rail:(intercept)
## 1.87148465 -1.28886230 0.31455718
## cost time car.carpool
## -0.43068483 -0.04752316 0.22887121
## car.rail carpool.carpool carpool.rail
## 0.69779915 1.33065508 -0.56797982
## rail.rail
## 0.71064723
The estimates seem to change more for parameters with larger standard error, though this is not uniformly the case by any means. One would expect larger samplign variance (which arises from a flatter ln L near the max) to translate into greater simulation variance (which raises when the location of the max changes with different draws).
The actual shares are :
The predicted shares are :
## bus car carpool rail
## 0.17596137 0.48239527 0.06923828 0.27209814
## [1] 0.9996931
The correspondence is very close but not exact.
Note: Simulated GHK probabilities do not necessarily sum to one over alternatives. This summing-up error at the individual level tends to cancel out when the probabilities are averaged over the sample. The forecasted shares (ie, average probabilities) sum to 0.9991102, which is only slightly different from 1.
Mo2 <- Mo
Mo2[Mo2$alt == 'car', 'cost'] <- Mo2[Mo2$alt == 'car', 'cost'] * 2
newShares <- apply(predict(p1, newdata = Mo2), 2, mean)
cbind(original = actShares, new = newShares,
change = round((newShares - actShares) / actShares * 100))
## original new change
## bus 0.17880795 0.2517884 41
## car 0.48123620 0.1689329 -65
## carpool 0.07064018 0.1432553 103
## rail 0.26931567 0.4356112 62
Substitution is not proportional. Carpool gets the largest percent increase.