--- title: "Logit models relaxing the iid hypothesis" output: rmarkdown::html_vignette bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{4. Logit models relaxing the iid hypothesis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r label = setup, include = FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, widtht = 65) options(width = 65) ``` In the previous section, we assumed that the error terms were iid (identically and independently distributed), i.e., uncorrelated and homoscedastic. Extensions of the basic multinomial logit model have been proposed by relaxing one of these two hypothesis while maintaining the hypothesis of a Gumbel distribution. ## The heteroskedastic logit model The heteroskedastic logit model was proposed by @BHAT:95. The probability that $U_l>U_j$ is: $$ P(\epsilon_jChisq)")] if (inherits(x, "htest")) result <- c(x$statistic, x$p.value) names(result) <- c("stat", "p-value") round(result, 3) } ``` ```{r label = 'homoscedasticity tests: results'} sapply(list(wald = wd.heter, lh = lh.heter, score = sc.heter, lr = lr.heter), statpval) ``` The homoscedasticity hypothesis is strongly rejected using the Wald test, but only at the 1 and 5\% level for, respectively, the score and the likelihood ratio tests. ### JapaneseFDI @HEAD:MAYE:04 analyzed the choice of one of the 57 European regions belonging to 9 countries by Japenese firms to implement a new production unit. ```{r label = 'loading the JapaneseFDI data set'} data("JapaneseFDI", package = "mlogit") jfdi <- mlogit.data(JapaneseFDI, chid.var = "firm", alt.var = "region", group.var = "country") ``` Note that we've used an extra argument to `mlogit.data` called `group.var` which indicates the grouping variable, which will be used later to define easily the nests. There are two sets of covariates: the wage rate `wage`, the unemployment rate `unemp`, a dummy indicating that the region is eligible to European funds `elig` and the area `area` are observed at the regional level and are therefore relevant for the estimation of the lower model, whereas the social cotisation rate `scrate` and the corporate tax rate `ctaxrate` are observed at the country level and are therefore suitable for the upper model. We first estimate a multinomial logit model: ```{r label = 'multinomial logit for JapaneseFDI'} ml.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) + scrate + ctaxrate | 0, data = jfdi) ``` Note that, as the covariates are only alternative specific, the intercepts are not identified and therefore have been removed. We next estimate the lower model, which analyses the choice of a region within a given country. Therefore, for each choice situation, we estimate the choice of a region on the subset of regions of the country which has been chosen. Moreover, observations concerning Portugal and Ireland are removed as these two countries are mono-region. ```{r label = 'lower model estimation'} lm.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) | 0, data = jfdi, subset = country == choice.c & ! country %in% c("PT", "IE")) ``` We next use the fitted lower model in order to compute the inclusive value, at the country level: $$ \mbox{iv}_{ig} = \ln \sum_{j \in B_g} e^{\beta^\top x_{ij}}, $$ where $B_g$ is the set of regions for country $g$. When a grouping variable is provided in the `mlogit.data` function, inclusive values are by default computed for every group $g$ (global inclusive values are obtained by setting the `type` argument to `"global"`). By default, `output` is set to `"chid"` and the results is a vector (if `type = "global"`) or a matrix (if `type = "region"`) with row number equal to the number of choice situations. If `output` is set to `"obs"`, a vector of length equal to the number of lines of the data in long format is returned. The following code indicates different ways to use the `logsum` function: ```{r label = 'use of the logsum function'} lmformula <- formula(lm.fdi) head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "group"), 2) head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "global")) head(logsum(ml.fdi, data = jfdi, formula = lmformula, output = "obs")) head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "global", output = "obs")) ``` To add the inclusive values in the original `data.frame`, we use `output = "obs"` and the `type` argument can be omitted as its default value is `"group"`: ```{r label = 'adding the logsum to the data'} JapaneseFDI$iv <- logsum(lm.fdi, data = jfdi, formula = lmformula, output = "obs") ``` We next select the relevant variables for the estimation of the upper model, select unique lines in order to keep only one observation for every choice situation / country combination and finally we coerce the response (`choice.c`) to a logical for the chosen country. ```{r label = 'data suitable for the upper model'} JapaneseFDI.c <- subset(JapaneseFDI, select = c("firm", "country", "choice.c", "scrate", "ctaxrate", "iv")) JapaneseFDI.c <- unique(JapaneseFDI.c) JapaneseFDI.c$choice.c <- with(JapaneseFDI.c, choice.c == country) ``` Finally, we estimate the upper model, using the previously computed inclusive value as a covariate. ```{r label = 'estimation of the upper model'} jfdi.c <- mlogit.data(JapaneseFDI.c, choice = "choice.c", alt.var = "country", chid.var = "firm", shape = "long") um.fdi <- mlogit(choice.c ~ scrate + ctaxrate + iv | 0, data = jfdi.c) ``` If one wants to obtain different `iv` coefficients for different countries, the `iv` covariate should be introduced in the 3th part of the formula and the coefficients for the two mono-region countries (Ireland and Portugal) should be set to 1, using the `constPar` argument. ```{r label = 'upper model with different iv coefficients'} um2.fdi <- mlogit(choice.c ~ scrate + ctaxrate | 0 | iv, data = jfdi.c, constPar = c("iv:PT" = 1, "iv:IE" = 1)) ``` We next estimate the full-information maximum likelihood nested model. It is obtained by adding a `nests` argument to the `mlogit` function. This should be a named list of alternatives (here regions), the names being the nests (here the countries). More simply, if a group variable has been indicated while using `mlogit.data`, `nests` can be a boolean. Two flavors of nested models can be estimated, using the `un.nest.el` argument which is a boolean. If `TRUE`, one imposes that the coefficient associated with the inclusive utility is the same for every nest, which means that the degree of correlation inside each nest is the same. If `FALSE`, a different coefficient is estimated for every nest. ```{r label = 'nested logit models'} nl.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) + scrate + ctaxrate | 0, data = jfdi, nests = TRUE, un.nest.el = TRUE) nl2.fdi <- update(nl.fdi, un.nest.el = FALSE, constPar = c('iv:PT' = 1, 'iv:IE' = 1)) ``` The results of the fitted models are presented in the following table: ```{r label = 'results for the JapaneseFDI data', results = 'asis'} library("texreg") htmlreg(list('Mult. logit' = ml.fdi, 'Lower model' = lm.fdi, 'Upper model' = um.fdi, 'Upper model' = um2.fdi, 'Nested logit' = nl.fdi, 'Nested logit' = nl2.fdi), fontsize = "footnotesize", float.pos = "hbt", label = "tab:nlogit", caption = "Choice by Japanese firms of a european region.") ``` For the nested logit models, two tests are of particular interest: - the test of no nests, which means that all the nest elasticities are equal to 1, - the test of unique nest elasticities, which means that all the nest elasticities are equal to each other. For the test of no nests, the nested model is provided as the unique argument for the `lrtest` and the `waldtest` function. For the `scoretest`, the constrained model (i.e., the multinomial logit model) is provided as the first argument and the second argument is `nests`, which describes the nesting structure that one wants to test. ```{r label = 'test of no nests'} lr.nest <- lrtest(nl2.fdi) wd.nest <- waldtest(nl2.fdi) sc.nest <- scoretest(ml.fdi, nests = TRUE, constPar = c('iv:PT' = 1, 'iv:IE' = 1)) ``` The Wald test can also be performed using the `linearHypothesis` function: ```{r label = 'test of no nests with linhyp'} lh.nest <- linearHypothesis(nl2.fdi, c("iv:BE = 1", "iv:DE = 1", "iv:ES = 1", "iv:FR = 1", "iv:IT = 1", "iv:NL = 1", "iv:UK = 1")) ``` ```{r label = 'results of the tests of no nests'} sapply(list(wald = wd.nest, lh = lh.nest, score = sc.nest, lr = lr.nest), statpval) ``` The three tests reject the null hypothesis of no correlation. We next test the hypothesis that all the nest elasticities are equal. ```{r label = 'computing the test for equal iv coefficients'} lr.unest <- lrtest(nl2.fdi, nl.fdi) wd.unest <- waldtest(nl2.fdi, un.nest.el = TRUE) sc.unest <- scoretest(ml.fdi, nests = TRUE, un.nest.el = FALSE, constPar = c('iv:IE' = 1, 'iv:PT' = 1)) lh.unest <- linearHypothesis(nl2.fdi, c("iv:BE = iv:DE", "iv:BE = iv:ES", "iv:BE = iv:FR", "iv:BE = iv:IT", "iv:BE = iv:NL", "iv:BE = iv:UK")) ``` ```{r label = 'results of the tests of equal iv coefficients'} sapply(list(wald = wd.unest, lh = lh.unest, score = sc.unest, lr = lr.unest), statpval) ``` Once again, the three tests strongly reject the hypothesis. ## Bibliography