Multinomial probit

The model

The multinomial probit is obtained with the same modeling that we used while presenting the random utility model. The utility of an alternative is still the sum of two components : U_j = V_j + ϵ_j.

but the joint distribution of the error terms is now a multivariate normal with mean 0 and with a matrix of covariance denoted Ω¹.

Alternative l is chosen if : $$ \left\{ \begin{array}{rcl} U_1-U_l&=&(V_1-V_l)+(\epsilon_1-\epsilon_l)<0\\ U_2-U_l&=&(V_2-V_l)+(\epsilon_2-\epsilon_l)<0\\ & \vdots & \\ U_J-U_l&=&(V_J-V_l)+(\epsilon_J-\epsilon_l)<0\\ \end{array} \right. $$

wich implies, denoting V_j^l = V_j − V_l :

$$ \left\{ \begin{array}{rclrcl} \epsilon^l_1 &=& (\epsilon_1-\epsilon_l) &<& - V^l_1\\ \epsilon^l_2 &=& (\epsilon_2-\epsilon_l) &<& - V^l_2\\ &\vdots & & \vdots & \\ \epsilon^l_J &=& (\epsilon_J-\epsilon_l) &<& - V^l_J\\ \end{array} \right. $$

The initial vector of errors ϵ are transformed using the following transformation :

ϵ^l = M^lϵ

where the transformation matrix M^l is a (J − 1) × J matrix obtained by inserting in an identity matrix a l^th column of −1. For example, if J = 4 and l = 3 :

$$ M^3 = \left( \begin{array}{cccc} 1 & 0 & -1 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & -1 & 1 \\ \end{array} \right) $$

The covariance matrix of the error differences is obtained using the following matrix :

V(ϵ^l) = V(M^lϵ) = M^lV(ϵ)M^l^⊤ = M^lΩM^l^⊤

The probability of choosing l is then :

with the hypothesis of distribution, this writes :

with :

Two problems arise with this model :

the first one is that the identified parameters are the elements of Ω^l and not of Ω. We must then carefully investigate the meanings of these elements.
the second one is that the probability is a J − 1 integral, which should be numerically computed. The relevant strategy in this context is to use simulations.

Identification

The meaning-full parameters are those of the covariance matrix of the error Ω. For example, with J = 3 :

$$ \Omega = \left( \begin{array}{ccc} \sigma_{11} & \sigma_{12} & \sigma_{13} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} \\ \end{array} \right) $$

$$ \Omega^1 = M^1 \Omega {M^1}^{\top}= \left( \begin{array}{cc} \sigma_{11}+\sigma_{22}-2\sigma_{12} & \sigma_{11} + \sigma_{23} - \sigma_{12} -\sigma_{13} \\ \sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13} & \sigma_{11} + \sigma_{33} - 2 \sigma_{13} \\ \end{array} \right) $$

The overall scale of utility being unidentified, one has to impose the value of one of the variance, for example the first one is fixed to 1. We then have :

$$ \Omega^1 = \left( \begin{array}{cc} 1 & \frac{\sigma_{11}+ \sigma_{23} - \sigma_{12} -\sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \\ \frac{\sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} & \frac{\sigma_{11} + \sigma_{33} - 2 \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \\ \end{array} \right) $$

Therefore, out the 6 structural parameters of the covariance matrix, only 3 can be identified. Moreover, it’s almost impossible to interpret these parameters.

More generally, with J alternatives, the number of the parameters of the covariance matrix is (J + 1) × J/2 and the number of identified parameters is J × (J − 1)/2 − 1.

Simulations

Let L^l be the Choleski decomposition of the covariance matrix of the error differences :

Ω^l = L^lL^l^⊤

This matrix is a lower triangular matrix of dimension (J − 1) :

$$ L^l= \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \\ l_{21} & l_{22} & 0 & ... & 0 \\ l_{31} & l_{32} & l_{33} & ... & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \\ \end{array} \right) $$

Let η be a vector of standard normal deviates :

η ∼ N(0, I)

Therefore, we have :

V(L^lη) = L^lV(η)L^l^⊤ = L^lIL^l^⊤ = Ω^l

Therefore, if we draw a vector of standard normal deviates η and apply to it this transformation, we get a realization of ϵ^l.

This joint probability can be written as a product of conditional and marginal probabilities :

$$ \begin{array}{rcl} P_l &=& \mbox{P}(\epsilon^l_1<- V_1^l \;\&\; \epsilon^l_2<-V_2^l \;\&\; ... \;\&\; \epsilon^l_J<-V_J^l))\\ &=& \mbox{P}(\epsilon^l_1<- V_1^l))\\ &\times&\mbox{P}(\epsilon^l_2<-V_2^l \mid \epsilon^l_1<-V_1^l) \\ &\times&\mbox{P}(\epsilon^l_3<-V_3^l \mid \epsilon^l_1<-V_1^l \;\&\; \epsilon^l_2<-V_2^l) \\ & \vdots & \\ &\times&\mbox{P}(\epsilon^l_J<-V_J^l \mid \epsilon^l_1<-V_1^l \;\&\; ... \;\&\; \epsilon^l_{J-1}<-V_{J-1}^l)) \\ \end{array} $$

The vector of error differences deviates is :

$$ \left( \begin{array}{c} \epsilon^l_1 \\ \epsilon^l_2 \\ \epsilon^l_3 \\ \vdots \\ \epsilon^l_J \end{array} \right) = \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \\ l_{21} & l_{22} & 0 & ... & 0 \\ l_{31} & l_{32} & l_{33} & ... & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \\ \end{array} \right) \times \left( \begin{array}{c} \eta_1 \\ \eta_2 \\ \eta_3 \\ \vdots \\ \eta_J \end{array} \right) $$

$$ \left( \begin{array}{c} \epsilon^l_1 \\ \epsilon^l_2 \\ \epsilon^l_3 \\ \vdots \\ \epsilon^l_J \end{array} \right) = \left( \begin{array}{l} l_{11}\eta_1 \\ l_{21}\eta_1+l_{22}\eta_2 \\ l_{31}\eta_1+l_{32}\eta_2 + l_{33}\eta_3\\ \vdots \\ l_{(J-1)1}\eta_1+l_{(J-1)2}\eta_2+...+l_{(J-1)(J-1)}\eta_{J-1} \end{array} \right) $$

Let’s now investigate the marginal and conditional probabilities :

the first one is simply the marginal probability for a standard normal deviates, therefore we have : $\mbox{P}(\epsilon^l_1<-V_1^l) = \Phi\left(-\frac{V_1^l}{l_{11}}\right)$
the second one is, for a given value of η₁ equal to $\Phi\left(-\frac{V^l_2+l_{21}\eta_1}{l_{22}}\right)$. We then have to compute the mean of this expression for any value of η₁ lower than $-\frac{V^l_1}{l_{11}}$. We then have, denoting ϕ̄₁ the truncated normal density : $$\mbox{P}(\epsilon^l_2<-V_2^l)=\int_{-\infty}^{-\frac{V^l_1}{l_{11}}}\Phi\left(-\frac{V^l_2+l_{21}\eta_1}{l_{22}}\right) \bar{\phi}_1(\eta_1)d\eta_1$$
the third one is, for given values of η₁ and η₂ equal to : $\Phi\left(-\frac{V^l_3+l_{31}\eta_1+l_{32}\eta_2}{l_{33}}\right)$. We then have : $$\mbox{P}(\epsilon^l_3<-V_3^l)=\int_{-\infty}^{-\frac{V^l_1}{l_{11}}}\int_{-\infty}^{-\frac{V^l_2+l_{21}\eta_1}{l_{22}}} \Phi\left(-\frac{V^l_3+l_{31}\eta_1+l_{32}\eta_2}{l_{33}}\right)\bar{\phi}_1(\eta_1)\bar{\phi}_2(\eta_2)d\eta_1d\eta_2$$
and so on.

This probabilities can easily be simulated by drawing numbers from a truncated normal distribution.

This so called GHK algorithm² (for Geweke, Hajivassiliou and Keane who developed this algorithm) can be described as follow :

Several points should be noted concerning this algorithm :

the utility differences should be computed respective to the chosen alternative for each individual,
the Choleski decomposition used should relies on the same covariance matrix of the errors. One method to attained this goal is to start from a given difference, e.g. the difference respective with the first alternative. The vector of error difference is then ϵ¹ and its covariance matrix is Ω¹ = L¹L¹^⊤. To apply a difference respective with an other alternative l, we construct a matrix called S^l which is obtained by using a J − 2 identity matrix, adding a first row of 0 and inserting a column of −1 at the l − 1^th position. For example, with 4 alternatives and l = 3, we have : $$S^3= \left( \begin{array}{ccc} 0 & -1 & 0 \\ 1 & -1 & 0 \\ 0 & -1 & 1 \\ \end{array} \right) $$ The elements of the choleski decomposition of the covariance matrix is then obtained as follow : Ω^l = S^lΩ¹S^l^⊤ = L^lL^l^⊤
to compute draws from a normal distribution truncated at a, the following trick is used : take a draw μ from a uniform distribution (between 0 and 1) ; then η = Φ⁻¹(μΦ(a)) is a draw from a normal distribution truncated at a

Applications

We use again the Fishing data frame, with only a subset of three alternatives used. The multinomial probit model is estimated using mlogit with the probit argument equal to TRUE.

library("mlogit")
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, shape="wide", varying=2:9, choice="mode")

Fish.mprobit <- mlogit(mode~price | income | catch, Fish, probit = TRUE, alt.subset=c('beach', 'boat','pier'))

summary(Fish.mprobit)

## 
## Call:
## mlogit(formula = mode ~ price | income | catch, data = Fish, 
##     start = strt, alt.subset = c("beach", "boat", "pier"), probit = TRUE)
## 
## Frequencies of alternatives:
##   beach    boat    pier 
## 0.18356 0.57260 0.24384 
## 
## bfgs method
## 1 iterations, 0h:0m:1s 
## g'(-H)^-1g = 1E+10 
## successive function values within tolerance limits 
## 
## Coefficients :
##                     Estimate  Std. Error z-value  Pr(>|z|)    
## boat:(intercept)  7.2514e-01  3.5810e-01  2.0250 0.0428700 *  
## pier:(intercept)  6.2393e-01  2.7397e-01  2.2773 0.0227657 *  
## price            -1.2154e-02  1.7696e-03 -6.8684 6.493e-12 ***
## boat:income       2.4005e-06  3.6698e-05  0.0654 0.9478450    
## pier:income      -6.5419e-05  4.0831e-05 -1.6022 0.1091086    
## beach:catch       1.5479e+00  4.3000e-01  3.5997 0.0003185 ***
## boat:catch        4.0010e-01  4.1600e-01  0.9618 0.3361590    
## pier:catch        1.2747e+00  5.5861e-01  2.2820 0.0224909 *  
## boat.pier         5.4570e-01  4.6259e-01  1.1796 0.2381415    
## pier.pier         6.9544e-01  2.9295e-01  2.3739 0.0175991 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -478.43
## McFadden R^2:  0.32751 
## Likelihood ratio test : chisq = 465.99 (p.value = < 2.22e-16)

Bibliography

Daganzo, C. 1979. Multinomial Probit: The Theory and Its Application to Demand Forecasting. Academic Press, New York.

Geweke, J., M. Keane, and D. Runkle. 1994. “Alternative Computational Approaches to Inference in the Multinomial Probit Model.” Review of Economics and Statistics 76: 609–32.

Hausman, J., and D. Wise. 1978. “A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdemendence and Heterogeneous Preferences.” Econometrica 48: 403–29.

see (Hausman and Wise 1978) and (Daganzo 1979)↩︎
see for example (Geweke, Keane, and Runkle 1994).↩︎