In the previous section, we assumed that the error terms were iid (identically and independently distributed), i.e., uncorrelated and homoscedastic. Extensions of the basic multinomial logit model have been proposed by relaxing one of these two hypothesis while maintaining the hypothesis of a Gumbel distribution.
The heteroskedastic logit model was proposed by Bhat (1995). The probability that Ul > Uj is:
$$ P(\epsilon_j<V_l-V_j+\epsilon_l)=e^{-e^{-\frac{(V_l-V_j+\epsilon_l)}{\theta_j}}}, $$
which implies the following conditional and unconditional probabilities
There is no closed form for this integral, but it can be efficiently computed using a Gauss quadrature method, and more precisely the Gauss-Laguerre quadrature method.
The nested logit model was first proposed by (McFadden 1978). It is a generalization of the multinomial logit model that is based on the idea that some alternatives may be joined in several groups (called nests). The error terms may then present some correlation in the same nest, whereas error terms of different nests are still uncorrelated.
Denoting m = 1...M the nests and Bm the set of alternatives belonging to nest m, the cumulative distribution of the errors is:
$$ \mbox{exp}\left(-\sum_{m=1}^M \left( \sum_{j \in B_m} e^{-\epsilon_j/\lambda_m}\right)^{\lambda_m}\right). $$
The marginal distributions of the ϵs are still univariate extreme value, but there is now some correlation within nests. 1 − λm is a measure of the correlation, i.e., λm = 1 implies no correlation. In the special case where λm = 1 ∀m, the errors are iid Gumbel errors and the nested logit model reduce to the multinomial logit model. It can then be shown that the probability of choosing alternative j that belongs to nest l is:
$$ P_j = \frac{e^{V_j/\lambda_l}\left(\sum_{k \in B_l} e^{V_k/\lambda_l}\right)^{\lambda_l-1}} {\sum_{m=1}^M\left(\sum_{k \in B_m} e^{V_k/\lambda_m}\right)^{\lambda_m}}, $$
and that this model is a random utility model if all the λ parameters are in the 0 − 1 interval.1
Let us now write the deterministic part of the utility of alternative j as the sum of two terms: the first one (Zj) being specific to the alternative and the second one (Wl) to the nest it belongs to:
Vj = Zj + Wl.
We can then rewrite the probabilities as follow:
$$ \begin{array}{rcl} P_j&=&\frac{e^{(Z_j+W_l)/\lambda_l}}{\sum_{k \in B_l} e^{(Z_k+W_l)/\lambda_l}}\times \frac{\left(\sum_{k \in B_l} e^{(Z_k+W_l)/\lambda_l}\right)^{\lambda_l}} {\sum_{m=1}^M\left(\sum_{k \in B_m} e^{(Z_k+W_m)/\lambda_m}\right)^{\lambda_m}}\\ &=&\frac{e^{Z_j/\lambda_l}}{\sum_{k \in B_l} e^{Z_k/\lambda_l}}\times \frac{\left(e^{W_l/\lambda_l}\sum_{k \in B_l} e^{Z_k/\lambda_l}\right)^{\lambda_l}} {\sum_{m=1}^M\left(e^{W_m/\lambda_m}\sum_{k \in B_m} e^{Z_k/\lambda_m}\right)^{\lambda_m}}. \end{array} $$
Then denote Il = ln ∑k ∈ BleZk/λl which is often called the log-sum, the inclusive value or the inclusive utility.2 We then can write the probability of choosing alternative j as:
$$ P_j=\frac{e^{Z_j/\lambda_l}}{\sum_{k \in B_l} e^{Z_k/\lambda_l}}\times \frac{e^{W_l+\lambda_l I_l}}{\sum_{m=1}^Me^{W_m+\lambda_m I_m}}. $$
The first term Pj ∣ l is the conditional probability of choosing alternative j if nest l is chosen. It is often referred as the lower model. The second term Pl is the marginal probability of choosing nest l and is referred as the upper model. Wl + λlIl can be interpreted as the expected utility of choosing the best alternative in l, Wl being the expected utility of choosing an alternative in this nest (whatever this alternative is) and λlIl being the expected extra utility gained by being able to choose the best alternative in the nest. The inclusive values link the two models. It is then straightforward to show that IIA applies within nests, but not for two alternatives in different nests.
A consistent but inefficient way of estimating the nested logit model is to estimate separately its two components. The coefficients of the lower model are first estimated, which enables the computation of the inclusive values Il. The coefficients of the upper model are then estimated, using Il as covariates. Maximizing directly the likelihood function of the nested model leads to a more efficient estimator.
Bhat (1995) estimated the
heteroscedastic logit model on the ModeCanada
data set.
Using mlogit
, the heteroscedastic logit model is obtained
by setting the heterosc
argument to TRUE
:
library("mlogit")
data("ModeCanada", package = "mlogit")
MC <- mlogit.data(ModeCanada, subset = noalt == 4, chid.var = "case",
alt.var = "alt", drop.index = TRUE)
ml.MC <- mlogit(choice ~ freq + cost + ivt + ovt | urban + income, MC,
reflevel = 'car', alt.subset = c("car", "train", "air"))
hl.MC <- mlogit(choice ~ freq + cost + ivt + ovt | urban + income, MC,
reflevel = 'car', alt.subset = c("car", "train", "air"), heterosc = TRUE)
coef(summary(hl.MC))[11:12, ]
## Estimate Std. Error z-value Pr(>|z|)
## sp.train 1.2371829 0.1104610 11.200182 0.000000e+00
## sp.air 0.5403239 0.1118353 4.831425 1.355592e-06
The variance of the error terms of train and air are respectively higher and lower than the variance of the error term of car (set to 1). Note that the z-values and p-values of the output are not particularly meaningful, as the hypothesis that the coefficient is zero (and not one) is tested. The homoscedasticity hypothesis can be tested using any of the three tests. A particular convenient syntax is provided in this case. For the likelihood ratio and the wald test, one can use only the fitted heteroscedastic model as argument. In this case, it is guessed that the hypothesis that the user wants to test is the homoscedasticity hypothesis.
or, more simply:
The Wald test can also be computed using the
linearHypothesis
function from the car
package:
For the score test, we provide the constrained model as argument,
which is the standard multinomial logit model and the supplementary
argument which defines the unconstrained model, which is in this case
heterosc = TRUE
.
## wald lh score lr
## stat 25.196 25.196 9.488 6.888
## p-value 0.000 0.000 0.009 0.032
The homoscedasticity hypothesis is strongly rejected using the Wald test, but only at the 1 and 5% level for, respectively, the score and the likelihood ratio tests.
Head and Mayer (2004) analyzed the choice of one of the 57 European regions belonging to 9 countries by Japenese firms to implement a new production unit.
data("JapaneseFDI", package = "mlogit")
jfdi <- mlogit.data(JapaneseFDI, chid.var = "firm", alt.var = "region",
group.var = "country")
Note that we’ve used an extra argument to mlogit.data
called group.var
which indicates the grouping variable,
which will be used later to define easily the nests. There are two sets
of covariates: the wage rate wage
, the unemployment rate
unemp
, a dummy indicating that the region is eligible to
European funds elig
and the area area
are
observed at the regional level and are therefore relevant for the
estimation of the lower model, whereas the social cotisation rate
scrate
and the corporate tax rate ctaxrate
are
observed at the country level and are therefore suitable for the upper
model.
We first estimate a multinomial logit model:
ml.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) + scrate +
ctaxrate | 0, data = jfdi)
Note that, as the covariates are only alternative specific, the intercepts are not identified and therefore have been removed. We next estimate the lower model, which analyses the choice of a region within a given country. Therefore, for each choice situation, we estimate the choice of a region on the subset of regions of the country which has been chosen. Moreover, observations concerning Portugal and Ireland are removed as these two countries are mono-region.
lm.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) | 0,
data = jfdi, subset = country == choice.c & ! country %in% c("PT", "IE"))
We next use the fitted lower model in order to compute the inclusive value, at the country level:
ivig = ln ∑j ∈ Bgeβ⊤xij,
where Bg is the set of
regions for country g. When a
grouping variable is provided in the mlogit.data
function,
inclusive values are by default computed for every group g (global inclusive values are
obtained by setting the type
argument to
"global"
). By default, output
is set to
"chid"
and the results is a vector (if
type = "global"
) or a matrix (if
type = "region"
) with row number equal to the number of
choice situations. If output
is set to "obs"
,
a vector of length equal to the number of lines of the data in long
format is returned. The following code indicates different ways to use
the logsum
function:
lmformula <- formula(lm.fdi)
head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "group"), 2)
## BE DE ES FR IE IT NL
## 3 3.595818 5.415838 3.593702 5.153709 1.933707 5.051387 4.077845
## 4 4.113243 5.765190 4.445012 5.383095 1.960462 5.687569 4.490379
## PT UK
## 3 2.702028 4.900622
## 4 3.200124 5.378561
## 3 4 5 7 8 9
## 6.736116 7.182139 7.121855 7.084245 7.133368 7.133368
## [1] 3.595818 3.595818 3.595818 3.595818 5.415838 5.415838
## [1] 6.736116 6.736116 6.736116 6.736116 6.736116 6.736116
To add the inclusive values in the original data.frame
,
we use output = "obs"
and the type
argument
can be omitted as its default value is "group"
:
We next select the relevant variables for the estimation of the upper
model, select unique lines in order to keep only one observation for
every choice situation / country combination and finally we coerce the
response (choice.c
) to a logical for the chosen
country.
JapaneseFDI.c <- subset(JapaneseFDI,
select = c("firm", "country", "choice.c", "scrate", "ctaxrate", "iv"))
JapaneseFDI.c <- unique(JapaneseFDI.c)
JapaneseFDI.c$choice.c <- with(JapaneseFDI.c, choice.c == country)
Finally, we estimate the upper model, using the previously computed inclusive value as a covariate.
jfdi.c <- mlogit.data(JapaneseFDI.c, choice = "choice.c",
alt.var = "country", chid.var = "firm", shape = "long")
um.fdi <- mlogit(choice.c ~ scrate + ctaxrate + iv | 0, data = jfdi.c)
If one wants to obtain different iv
coefficients for
different countries, the iv
covariate should be introduced
in the 3th part of the formula and the coefficients for the two
mono-region countries (Ireland and Portugal) should be set to 1, using
the constPar
argument.
um2.fdi <- mlogit(choice.c ~ scrate + ctaxrate | 0 | iv, data = jfdi.c,
constPar = c("iv:PT" = 1, "iv:IE" = 1))
We next estimate the full-information maximum likelihood nested
model. It is obtained by adding a nests
argument to the
mlogit
function. This should be a named list of
alternatives (here regions), the names being the nests (here the
countries). More simply, if a group variable has been indicated while
using mlogit.data
, nests
can be a boolean.
Two flavors of nested models can be estimated, using the
un.nest.el
argument which is a boolean. If
TRUE
, one imposes that the coefficient associated with the
inclusive utility is the same for every nest, which means that the
degree of correlation inside each nest is the same. If
FALSE
, a different coefficient is estimated for every
nest.
nl.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) + scrate +
ctaxrate | 0, data = jfdi, nests = TRUE, un.nest.el = TRUE)
nl2.fdi <- update(nl.fdi, un.nest.el = FALSE, constPar = c('iv:PT' = 1,
'iv:IE' = 1))
The results of the fitted models are presented in the following table:
library("texreg")
htmlreg(list('Mult. logit' = ml.fdi, 'Lower model' = lm.fdi,
'Upper model' = um.fdi, 'Upper model' = um2.fdi, 'Nested logit' = nl.fdi,
'Nested logit' = nl2.fdi),
fontsize = "footnotesize", float.pos = "hbt", label = "tab:nlogit",
caption = "Choice by Japanese firms of a european region.")
Mult. logit | Lower model | Upper model | Upper model | Nested logit | Nested logit | |
---|---|---|---|---|---|---|
log(wage) | 0.47 | 1.21* | 0.46 | 0.77** | ||
(0.25) | (0.48) | (0.25) | (0.25) | |||
unemp | -8.90*** | -9.77*** | -7.62*** | -6.95** | ||
(1.69) | (2.39) | (1.60) | (2.28) | |||
elig | -0.25 | -0.89** | -0.34 | -0.18 | ||
(0.21) | (0.33) | (0.20) | (0.18) | |||
log(area) | 0.31*** | 0.29*** | 0.29*** | 0.15** | ||
(0.05) | (0.06) | (0.05) | (0.05) | |||
scrate | -2.26*** | -2.49*** | 0.15 | -2.44*** | -0.88 | |
(0.38) | (0.38) | (1.72) | (0.38) | (0.86) | ||
ctaxrate | -4.82*** | -3.69*** | -1.78 | -4.13*** | -2.35 | |
(0.59) | (0.60) | (1.32) | (0.66) | (1.25) | ||
iv | 0.66*** | 0.85*** | ||||
(0.06) | (0.08) | |||||
iv:BE | 0.72*** | 0.48* | ||||
(0.08) | (0.19) | |||||
iv:DE | 0.72*** | 0.52** | ||||
(0.08) | (0.17) | |||||
iv:ES | 0.86*** | 0.77*** | ||||
(0.05) | (0.20) | |||||
iv:FR | 0.75*** | 0.67*** | ||||
(0.04) | (0.09) | |||||
iv:IT | 0.62*** | 0.25** | ||||
(0.05) | (0.08) | |||||
iv:NL | 0.69*** | 0.18 | ||||
(0.06) | (0.10) | |||||
iv:UK | 0.87*** | 0.86*** | ||||
(0.06) | (0.10) | |||||
AIC | 3469.13 | 1738.04 | 1746.58 | 1713.86 | 3467.96 | 3437.01 |
Log Likelihood | -1728.57 | -865.02 | -870.29 | -845.93 | -1726.98 | -1703.50 |
Num. obs. | 452 | 421 | 452 | 452 | 452 | 452 |
K | 57 | 57 | 9 | 9 | 57 | 57 |
***p < 0.001; **p < 0.01; *p < 0.05 |
For the nested logit models, two tests are of particular interest:
For the test of no nests, the nested model is provided as the unique
argument for the lrtest
and the waldtest
function. For the scoretest
, the constrained model (i.e.,
the multinomial logit model) is provided as the first argument and the
second argument is nests
, which describes the nesting
structure that one wants to test.
lr.nest <- lrtest(nl2.fdi)
wd.nest <- waldtest(nl2.fdi)
sc.nest <- scoretest(ml.fdi, nests = TRUE, constPar = c('iv:PT' = 1,
'iv:IE' = 1))
The Wald test can also be performed using the
linearHypothesis
function:
lh.nest <- linearHypothesis(nl2.fdi, c("iv:BE = 1", "iv:DE = 1",
"iv:ES = 1", "iv:FR = 1", "iv:IT = 1", "iv:NL = 1", "iv:UK = 1"))
## wald lh score lr
## stat 208.407 208.407 60.28 50.122
## p-value 0.000 0.000 0.00 0.000
The three tests reject the null hypothesis of no correlation. We next test the hypothesis that all the nest elasticities are equal.
lr.unest <- lrtest(nl2.fdi, nl.fdi)
wd.unest <- waldtest(nl2.fdi, un.nest.el = TRUE)
sc.unest <- scoretest(ml.fdi, nests = TRUE, un.nest.el = FALSE,
constPar = c('iv:IE' = 1, 'iv:PT' = 1))
lh.unest <- linearHypothesis(nl2.fdi, c("iv:BE = iv:DE", "iv:BE = iv:ES",
"iv:BE = iv:FR", "iv:BE = iv:IT", "iv:BE = iv:NL", "iv:BE = iv:UK"))
## wald lh score lr
## stat 73.535 73.535 60.28 46.954
## p-value 0.000 0.000 0.00 0.000
Once again, the three tests strongly reject the hypothesis.
A slightly different version of the nested logit model (Daly 1987) is often used, but is not compatible with the random utility maximization hypothesis. Its difference with the previous expression is that the deterministic parts of the utility for each alternative is not divided by the nest elasticity. The differences between the two versions have been discussed in Koppelman and Wen (1998), Heiss (2002) and Hensher and Greene (2002).↩︎
We’ve already encountered this expression in vignette 3. Random utility model and the multinomial logit model.↩︎