Random utility models is the reference approach in economics when one wants to analyze the choice by a decision maker of one among a set of mutually exclusive alternatives. Since the seminal works of Daniel Mc Fadden (McFadden 1974, MCFAD:78) who won the Nobel prize in economics “for his development of theory and methods for analyzing discrete choice”, a large amount of theoretical and empirical literature have been developed in this field.1
Among the numerous applications of such models, we can cite the following: Head and Mayer (2004) investigate the determinants of the choice by Japanese firms of an European region for implementing a new production unit, Fowlie (2010) analyse the choice of a NOx emissions reduction technology by electricity production plants, Kling and Thomson (1996) and Herriges and Kling (1999) consider how the choice of a fishing mode can be explained by the price and the catch expectency, Jain, Vilcassim, and Chintagunta (1994) investigate the brand choice for yogurts, Bhat (1995) analyse transport mode choice for the Montreal-Toronto corridor.
These models rely on the hypothesis that the decision maker is able to rank the different alternatives by an order of preference represented by a utility function, the chosen alternative being the one which is associated with the highest level of utility. They are called random utility models because part of the utility is unobserved and is modelised as the realisation of a random deviate.
Different hypothesis on the distribution of this random deviate lead to different flavors of random utility models. Early developments of these models were based on the hypothesis of identically and independent errors following a Gumbel distribution,2 leading to the multinomial logit model (MNL). More general models have since been proposed, either based on less restrictive distribution hypothesis or by introducing individual heterogeneity.
Maintaining the Gumbel distribution hypothesis but relaxing the iid hypothesis leads to more general logit models (the heteroscedastic and the nested logit models). Relaxing the Gumbel distribution hypothesis and using a normal distribution instead leads to the multinomial probit model which can deal with heteroscedasticity and correlation of the errors.
Individual heterogeneity can be introduced in the parameters associated with the covariates entering the observable part of the utility or in the variance of the errors. This leads respectively to the mixed effect models (MXL) and the scale heterogeneity model (S-MNL).
The first version of mlogit
was posted in 2008, it was
the first R
package allowing the estimation of random
utility models. Since then, other package have emerged (see Sarrias and Daziano 2017, 4 for a survey of
revelant R pakages). mlogit
still provides the
widests set of estimators for random utility models and, moreover, its
syntax has been adopted by other R
packages. Those packages
provide usefull additions to mlogit
:
mnlogit
enables efficient estimation of
MNL for large data sets,gmnl
estimates MXL and
S-MNL, but also the so called generalized multinomial
logit model G-MNL which nests them,gmnl
package,bayesm
, MNP
and RSGHB
packages.The article is organized as follow. Vignette formula/data explains how the usual formula-data and testing interface can be extended in order to describes in a very natural way the model to be estimated. Vignette random utility models describe the landmark multinomial logit model. Vignette relaxing the iid hypothesis, mixed logit model and multinomial probit model present three important extensions of this basic model: Vignette relaxing the iid hypothesis presents models that relax the iid Gumbel hypothesis, mixed logit model introduces slope heterogeneity by considering some parameters as random and multinomial probit model relax the Gumbel distribution hypothesis by assuming a multivariate norm distribution.