mlogit.knit

Random utility models is the reference approach in economics when one wants to analyze the choice by a decision maker of one among a set of mutually exclusive alternatives. Since the seminal works of Daniel Mc Fadden (McFadden 1974, MCFAD:78) who won the Nobel prize in economics “for his development of theory and methods for analyzing discrete choice”, a large amount of theoretical and empirical literature have been developed in this field.¹

Among the numerous applications of such models, we can cite the following: Head and Mayer (2004) investigate the determinants of the choice by Japanese firms of an European region for implementing a new production unit, Fowlie (2010) analyse the choice of a NO_x emissions reduction technology by electricity production plants, Kling and Thomson (1996) and Herriges and Kling (1999) consider how the choice of a fishing mode can be explained by the price and the catch expectency, Jain, Vilcassim, and Chintagunta (1994) investigate the brand choice for yogurts, Bhat (1995) analyse transport mode choice for the Montreal-Toronto corridor.

These models rely on the hypothesis that the decision maker is able to rank the different alternatives by an order of preference represented by a utility function, the chosen alternative being the one which is associated with the highest level of utility. They are called random utility models because part of the utility is unobserved and is modelised as the realisation of a random deviate.

Different hypothesis on the distribution of this random deviate lead to different flavors of random utility models. Early developments of these models were based on the hypothesis of identically and independent errors following a Gumbel distribution,² leading to the multinomial logit model (MNL). More general models have since been proposed, either based on less restrictive distribution hypothesis or by introducing individual heterogeneity.

Maintaining the Gumbel distribution hypothesis but relaxing the iid hypothesis leads to more general logit models (the heteroscedastic and the nested logit models). Relaxing the Gumbel distribution hypothesis and using a normal distribution instead leads to the multinomial probit model which can deal with heteroscedasticity and correlation of the errors.

Individual heterogeneity can be introduced in the parameters associated with the covariates entering the observable part of the utility or in the variance of the errors. This leads respectively to the mixed effect models (MXL) and the scale heterogeneity model (S-MNL).

The first version of mlogit was posted in 2008, it was the first R package allowing the estimation of random utility models. Since then, other package have emerged (see Sarrias and Daziano 2017, 4 for a survey of revelant R pakages). mlogit still provides the widests set of estimators for random utility models and, moreover, its syntax has been adopted by other R packages. Those packages provide usefull additions to mlogit:

mnlogit enables efficient estimation of MNL for large data sets,
gmnl estimates MXL and S-MNL, but also the so called generalized multinomial logit model G-MNL which nests them,
latent-class multinomial logit models (LC-MNL), for which the heterogeneity is due to the fact that individuals belong to different classes and mixed-mixed models (MM-MNL) which are a mixture of LC-MNL and MXL can also be estimated using the gmnl package,
bayesian estimators for multinomial models are provided by the bayesm, MNP and RSGHB packages.

The article is organized as follow. Vignette formula/data explains how the usual formula-data and testing interface can be extended in order to describes in a very natural way the model to be estimated. Vignette random utility models describe the landmark multinomial logit model. Vignette relaxing the iid hypothesis, mixed logit model and multinomial probit model present three important extensions of this basic model: Vignette relaxing the iid hypothesis presents models that relax the iid Gumbel hypothesis, mixed logit model introduces slope heterogeneity by considering some parameters as random and multinomial probit model relax the Gumbel distribution hypothesis by assuming a multivariate norm distribution.

Bibliography

Bhat, Chandra R. 1995. “A Heteroscedastic Extreme Value Model of Intercity Travel Mode Choice.” Transportation Research Part B: Methodological 29 (6): 471–83. http://www.sciencedirect.com/science/article/pii/0191261595000156.

Fowlie, Meredith. 2010. “Emissions Trading, Electricity Restructuring, and Investment in Pollution Abatement.” American Economic Review 100 (3): 837–69. https://doi.org/10.1257/aer.100.3.837.

Head, Keith, and Thierry Mayer. 2004. “Market Potential and the Location of Japanese Investment in the European Union.” The Review of Economics and Statistics 86 (4): 959–72. https://doi.org/10.1162/0034653043125257.

Herriges, Joseph A., and Catherine L. Kling. 1999. “Nonlinear Income Effects in Random Utility Models.” The Review of Economics and Statistics 81 (1): 62–72. https://doi.org/10.1162/003465399767923827.

Jain, Dipak C., Naufel J. Vilcassim, and Pradeep K. Chintagunta. 1994. “A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data.” Journal of Business & Economic Statistics 12 (3): 317–28.

Kling, Catherine L., and Cynthia J. Thomson. 1996. “The Implications of Model Specification for Welfare Estimation in Nested Logit Models.” American Journal of Agricultural Economics 78 (1): 103–14.

McFadden, D. 1974. “The Measurement of Urban Travel Demand.” Journal of Public Economics 3: 303–28.

Sarrias, Mauricio, and Ricardo Daziano. 2017. “Multinomial Logit Models with Continuous and Discrete Individual Heterogeneity in r: The Gmnl Package.” Journal of Statistical Software, Articles 79 (2): 1–46. https://doi.org/10.18637/jss.v079.i02.

Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge, UK: Cambridge University Press.

For a presentation of this literature, see Train (2009) ; the theoretical parts of this paper draw heavily on this book.↩︎
This distribution has the distinctive advantage that it leads to a probability which can be written has an integral which has a closed form.↩︎