The multinomial probit is obtained with the same modeling that we used while presenting the random utility model. The utility of an alternative is still the sum of two components : Uj = Vj + ϵj.
but the joint distribution of the error terms is now a multivariate normal with mean 0 and with a matrix of covariance denoted Ω1.
Alternative l is chosen if : $$ \left\{ \begin{array}{rcl} U_1-U_l&=&(V_1-V_l)+(\epsilon_1-\epsilon_l)<0\\ U_2-U_l&=&(V_2-V_l)+(\epsilon_2-\epsilon_l)<0\\ & \vdots & \\ U_J-U_l&=&(V_J-V_l)+(\epsilon_J-\epsilon_l)<0\\ \end{array} \right. $$
wich implies, denoting Vjl = Vj − Vl :
$$ \left\{ \begin{array}{rclrcl} \epsilon^l_1 &=& (\epsilon_1-\epsilon_l) &<& - V^l_1\\ \epsilon^l_2 &=& (\epsilon_2-\epsilon_l) &<& - V^l_2\\ &\vdots & & \vdots & \\ \epsilon^l_J &=& (\epsilon_J-\epsilon_l) &<& - V^l_J\\ \end{array} \right. $$
The initial vector of errors ϵ are transformed using the following transformation :
ϵl = Mlϵ
where the transformation matrix Ml is a (J − 1) × J matrix obtained by inserting in an identity matrix a lth column of −1. For example, if J = 4 and l = 3 :
$$ M^3 = \left( \begin{array}{cccc} 1 & 0 & -1 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & -1 & 1 \\ \end{array} \right) $$
The covariance matrix of the error differences is obtained using the following matrix :
V(ϵl) = V(Mlϵ) = MlV(ϵ)Ml⊤ = MlΩMl⊤
The probability of choosing l is then :
with the hypothesis of distribution, this writes :
with :
Two problems arise with this model :
The meaning-full parameters are those of the covariance matrix of the error Ω. For example, with J = 3 :
$$ \Omega = \left( \begin{array}{ccc} \sigma_{11} & \sigma_{12} & \sigma_{13} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} \\ \end{array} \right) $$
$$ \Omega^1 = M^1 \Omega {M^1}^{\top}= \left( \begin{array}{cc} \sigma_{11}+\sigma_{22}-2\sigma_{12} & \sigma_{11} + \sigma_{23} - \sigma_{12} -\sigma_{13} \\ \sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13} & \sigma_{11} + \sigma_{33} - 2 \sigma_{13} \\ \end{array} \right) $$
The overall scale of utility being unidentified, one has to impose the value of one of the variance, for example the first one is fixed to 1. We then have :
$$ \Omega^1 = \left( \begin{array}{cc} 1 & \frac{\sigma_{11}+ \sigma_{23} - \sigma_{12} -\sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \\ \frac{\sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} & \frac{\sigma_{11} + \sigma_{33} - 2 \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \\ \end{array} \right) $$
Therefore, out the 6 structural parameters of the covariance matrix, only 3 can be identified. Moreover, it’s almost impossible to interpret these parameters.
More generally, with J alternatives, the number of the parameters of the covariance matrix is (J + 1) × J/2 and the number of identified parameters is J × (J − 1)/2 − 1.
Let Ll be the Choleski decomposition of the covariance matrix of the error differences :
Ωl = LlLl⊤
This matrix is a lower triangular matrix of dimension (J − 1) :
$$ L^l= \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \\ l_{21} & l_{22} & 0 & ... & 0 \\ l_{31} & l_{32} & l_{33} & ... & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \\ \end{array} \right) $$
Let η be a vector of standard normal deviates :
η ∼ N(0, I)
Therefore, we have :
V(Llη) = LlV(η)Ll⊤ = LlILl⊤ = Ωl
Therefore, if we draw a vector of standard normal deviates η and apply to it this transformation, we get a realization of ϵl.
This joint probability can be written as a product of conditional and marginal probabilities :
$$ \begin{array}{rcl} P_l &=& \mbox{P}(\epsilon^l_1<- V_1^l \;\&\; \epsilon^l_2<-V_2^l \;\&\; ... \;\&\; \epsilon^l_J<-V_J^l))\\ &=& \mbox{P}(\epsilon^l_1<- V_1^l))\\ &\times&\mbox{P}(\epsilon^l_2<-V_2^l \mid \epsilon^l_1<-V_1^l) \\ &\times&\mbox{P}(\epsilon^l_3<-V_3^l \mid \epsilon^l_1<-V_1^l \;\&\; \epsilon^l_2<-V_2^l) \\ & \vdots & \\ &\times&\mbox{P}(\epsilon^l_J<-V_J^l \mid \epsilon^l_1<-V_1^l \;\&\; ... \;\&\; \epsilon^l_{J-1}<-V_{J-1}^l)) \\ \end{array} $$
The vector of error differences deviates is :
$$ \left( \begin{array}{c} \epsilon^l_1 \\ \epsilon^l_2 \\ \epsilon^l_3 \\ \vdots \\ \epsilon^l_J \end{array} \right) = \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \\ l_{21} & l_{22} & 0 & ... & 0 \\ l_{31} & l_{32} & l_{33} & ... & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \\ \end{array} \right) \times \left( \begin{array}{c} \eta_1 \\ \eta_2 \\ \eta_3 \\ \vdots \\ \eta_J \end{array} \right) $$
$$ \left( \begin{array}{c} \epsilon^l_1 \\ \epsilon^l_2 \\ \epsilon^l_3 \\ \vdots \\ \epsilon^l_J \end{array} \right) = \left( \begin{array}{l} l_{11}\eta_1 \\ l_{21}\eta_1+l_{22}\eta_2 \\ l_{31}\eta_1+l_{32}\eta_2 + l_{33}\eta_3\\ \vdots \\ l_{(J-1)1}\eta_1+l_{(J-1)2}\eta_2+...+l_{(J-1)(J-1)}\eta_{J-1} \end{array} \right) $$
Let’s now investigate the marginal and conditional probabilities :
This probabilities can easily be simulated by drawing numbers from a truncated normal distribution.
This so called GHK algorithm2 (for Geweke, Hajivassiliou and Keane who developed this algorithm) can be described as follow :
Several points should be noted concerning this algorithm :
We use again the Fishing
data frame, with only a subset
of three alternatives used. The multinomial probit model is estimated
using mlogit
with the probit
argument equal to
TRUE
.
library("mlogit")
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, shape="wide", varying=2:9, choice="mode")
Fish.mprobit <- mlogit(mode~price | income | catch, Fish, probit = TRUE, alt.subset=c('beach', 'boat','pier'))
##
## Call:
## mlogit(formula = mode ~ price | income | catch, data = Fish,
## start = strt, alt.subset = c("beach", "boat", "pier"), probit = TRUE)
##
## Frequencies of alternatives:
## beach boat pier
## 0.18356 0.57260 0.24384
##
## bfgs method
## 1 iterations, 0h:0m:1s
## g'(-H)^-1g = 1E+10
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## boat:(intercept) 7.2514e-01 3.5810e-01 2.0250 0.0428700 *
## pier:(intercept) 6.2393e-01 2.7397e-01 2.2773 0.0227657 *
## price -1.2154e-02 1.7696e-03 -6.8684 6.493e-12 ***
## boat:income 2.4005e-06 3.6698e-05 0.0654 0.9478450
## pier:income -6.5419e-05 4.0831e-05 -1.6022 0.1091086
## beach:catch 1.5479e+00 4.3000e-01 3.5997 0.0003185 ***
## boat:catch 4.0010e-01 4.1600e-01 0.9618 0.3361590
## pier:catch 1.2747e+00 5.5861e-01 2.2820 0.0224909 *
## boat.pier 5.4570e-01 4.6259e-01 1.1796 0.2381415
## pier.pier 6.9544e-01 2.9295e-01 2.3739 0.0175991 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -478.43
## McFadden R^2: 0.32751
## Likelihood ratio test : chisq = 465.99 (p.value = < 2.22e-16)