Loading [MathJax]/jax/output/HTML-CSS/jax.js

Implementation of Shi’s non-degeranate Vuong test

Yves Croissant

The original Vuong test

Vuong (1989) proposed a test for non-nested model. He considers two competing models, Fβ={f(y|z;β);ββ} and Gγ={g(y|z;γ);γΓ}. Denoting h(y|z) the true conditional density. The distance of Fβ from the true model is measured by the minimum KLIC:

Df=E0[lnh(yz)]E0[lnf(yz;β)]

where E0 is the expected value using the true joint distribution of (y,X) and β is the pseudo-true value of β1. As the true model is unobserved, denoting θ=(β,γ), we consider the difference of the KLIC distance to the true model of model Gγ and model Fβ:

Λ(θ)=DgDf=E0[lnf(yz;β)]E0[lng(yz;γ)]=E0[lnf(yz;β)g(yz;γ)]

The null hypothesis is that the distance of the two models to the true models are equal or, equivalently, that: Λ=0. The alternative hypothesis is either Λ>0, which means that Fβ is better than Gγ or Λ<0, which means that Gγ is better than Fβ. Denoting, for a given random sample of size N, ˆβ and ˆγ the maximum likelihood estimators of the two models and lnLf(ˆβ) and lnLg(ˆγ) the maximum value of the log-likelihood functions of respectively Fβ and Gγ, Λ can be consistently estimated by:

ˆΛN=1NNn=1(lnf(ynxn,ˆβ)lng(ynxn,ˆγ))=1N(lnLf(ˆβ)lnLg(ˆγ))

which is the likelihood ratio divided by the sample size. Note that the statistic of the standard likelihood ratio test, suitable for nested models is 2(lnLf(ˆβ)lnLg(ˆγ)), which is 2NˆΛN.

The variance of Λ is:

ω2=Vo[lnf(yx;β)g(yx;γ)]

which can be consistently estimated by:

ˆω2N=1NNn=1(lnf(ynxn,ˆβ)lng(ynx,nˆγ))2ˆΛ2N

Three different cases should be considered:

The distribution of the statistic depends on whether ω2 is zero or positive. If ω2 is positive, the statistic is ˆTN=NˆΛNˆωN and, under the null hypothesis that the two models are equivalent, follows a standard normal distribution. This is the case for two strictly non-nested models.

On the contrary, if ω2=0, the distribution is much more complicated. We need to define two matrices: A contains the expected values of the second derivates of Λ:

A(θ)=E0[2Λθθ]=E0[2lnfββ002lngββ]=[Af(β)00Ag(γ)]

and B the variance of its first derivatives:

B(θ)=E0[ΛθΛθ]=E0[(lnfβ,lngγ)(lnfβ,lngγ)]=E0[lnfβlnfβlnfβlngγlngγlnfβlngγlngγ]

or:

B(θ)=[Bf(β)Bfg(β,γ)Bgf(β,γ)Bg(γ)]

Then:

W(θ)=B(θ)[A(θ)]1=[Bf(β)A1f(β)Bfg(β,γ)A1g(γ)Bgf(γ,β)A1f(β)Bg(γ)A1g(γ)]

Denote λ the eigen values of W. When ω2=0 (which is always the case for nested models), the statistic is the one used in the standard likelihood ratio test: 2(lnLflnLg)=2NˆΛN which, under the null, follows a weighted χ2 distribution with weights equal to λ. The Vuong test can be seen in this context as a more robust version of the standard likelihood ratio test, because it doesn’t assume, under the null, that the larger model is correctly specified.

Note that, if the larger model is correctly specified, the information matrix equality implies that Bf(θ)Af(θ). In this case, the two matrices on the diagonal of W reduce to IKf and IKg, the trace of W to KgKf and the distribution of the statistic under the null reduce to a χ2 with KgKf degrees of freedom.

The W matrice can be consistently estimate by computing the first and the second derivatives of the likelihood functions of the two models for ˆθ. For example,

ˆAf(ˆβ)=1NNn=12lnfββ(ˆβ,xn,yn)

ˆBfg(ˆθ)=1NNn=1lnfβ(ˆβ,xn,yn)lngγ(ˆγ,xn,yn)

For the overlapping case, the test should be performed in two steps:

The non-degenerate Vuong test

Shi (2015) proposed a non-degenerate version of the Vuong (1989) test. She shows that the Vuong test has size distortion, leading to subsequent overrejection. The cause of this problem is that the distribution of ˆΛ is discontinuous in the ω2 parameter (namely a normal distribution if ω2>0 and a distribution related to a weight χ2 distribution if ω2=0). Especially in small samples, it may be difficult to distinguish a positive versus a zero value of ω2 because of sampling error. To solve this problem, using local asymptotic theory, Shi (2015) showed that, rewriting the Vuong statistic as:

ˆT=NˆΛNNˆω2N

the asymptotic distribution of the numerator and of the square of the denominator of the Vuong statistic is the same as:

(NˆΛNNˆω2N)d(JΛJω)=(σzωzθVzθ/2σ22σρVzθ+zθV2zθ)

where:

(zωzθ)N(0,(1ρρI)),

ρ is a vector of length Kf+Kg, σ a positive scalar and V is the diagonal matrix containing the eigen values of B12A1B12.

Based on this result, Shi (2015) showed:

Shi (2015) therefore proposed to modify the numerator of the Vuong statistic:

ˆΛmodN=ˆΛN+tr(V)2N

and to add a constant to the denominator, so that:

(ˆωmod(c))2=ˆω2+ctr(V)2/N

The non-degenarate Vuong test is then:

TmodN=ˆΛmodNˆωmod=NˆΛN+tr(V)/2Nˆω2+ctr(V)2/N

The distribution of the modified Vuong statistic can be estimated by simulations: drawing in the distribution of (zω,zθ), we compute for every draw JΛ, Jω and JΛ/Jω. As σ and ρ can’t be estimated consistently, the supremum other these parameters are taken, and Shi (2015) indicates that ρ should be in this case a vector where all the elements are zero except for the one that coincides with the highest absolute value of V which is set to 1.

The Shi test is then computed as follow:

  1. start with a given size for the test, say α=0.05,
  2. for a given value of c, choose σ which maximize the simulated critical value for c and α,
  3. adjust c so that this critical value equals the normal critical value, up to a small disperency (say 0.1); for example, if the size is 5%, the target is v1α/2=1.96+0.1=2.06,
  4. compute ˆTmodN for the given values of c and σ ; if ˆTmodN>v1α/2, reject the null hypothesis at the α level,
  5. to get a p-value, if ˆTmodN>v1α/2 increase α and repeat the previous steps until a new value of α is obtained so that ˆTmodN=v1α/2, α being the p-value of the test.

Simulations

Shi (2015) provides an example of simulations of non-nested linear models that shows that the distribution of the Vuong statistic can be very different from a standard normal. The data generating process used for the simulations is:

y=1+Kfk=1zfk+Kgk=1zgk+ϵ

where zf is the set of Kf covariates that are used in the first model and zg the set of Kg covariates used in the second model and ϵN(0,1a2). zfkN(0,a/Kf) and zgkN(0,a/Kg), so that the explained variance explained by the two competing models is the same (equal to a2) and the null hypothesis of the Vuong test is true. The vuong_sim unables to simulate values of the Vuong test. As in Shi (2015), we use a very different degree of parametrization for the two models, with Kf=15 and KG=1.

library(micsr)
Vuong <- vuong_sim(N = 100, R = 1000, Kf = 15, Kg = 1, a = 0.5)
head(Vuong)
## [1]  1.5588591  2.5625929 -1.0904569  0.7207765  2.7263322  1.1310495
mean(Vuong)
## [1] 1.119354
mean(abs(Vuong) > 1.96)
## [1] 0.207

We can see that the the mean of the statistic for the 1000 replications is far away from 0, which means that the numerator of the Vuong statistic is seriously biased. 20.7% of the values of the statistic are greater than the critical value so that the Vuong test will lead in such context a noticeable overrejection. The empirical pdf is shown in the following figure, along with the normal pdf.

library("ggplot2")
ggplot(data = data.frame(Vuong = Vuong)) + geom_density(aes(x = Vuong)) +
    geom_function(fun = dnorm, linetype = "dotted")

Implementation of the non-degenarate Vuong test

The micsr package provides a ndvuong function that implements the classical Vuong test. It has a nest argument (that is FALSE by default but can be set to TRUE to get the nested version of the Vuong test). This package also provide a llcont generic which returns a vector of length N containing the contribution of every observation to the log-likelihood.

The ndvuong package provides the ndvuong function. As for the vuongtest function, the two main arguments are two fitted models (say model1 and model2). The ˆΛn vector is obtained using llcont(model1) - llcont(model2). The relevant matrices Ai and Bi are computed from the fitted models using the estfun and the meat functions from the sandwich package. More precisely, A1 is bdiag(-bread(model1), bread(model2)3 and B is crossprod(estfun(model1), - estfun(model2)) / N, where N is the sample size. Therefore, the ndvuong function can be used with any models for which a llcont, a estfun and a bread method is available.

Applications

Voter turnout

The first application is the example used in Shi (2015) and is used to compare our R program with Shi’s stata’s program. Coate and Conlin (2004) used several models of electoral participation, using data concerning referenda about alcool sales regulation in Texas. Three models are estimated: the prefered group-utilitarian model, a “simple, but plausible, alternative: the intensity model” and a reduced form model estimated by the seemingly unrelated residuals method. They are provided in the ndvuong package as turnout, a list of three fitted models4. The results of the Shi test are given below. We first compute the Shi statistic for an error level of 5%. We therefore set the size argument to 0.05 (this is actually the default value) and the pval argument to FALSE.

library("micsr")
test <- ndvuong(turnout$group, turnout$intens, size = 0.05, pval = FALSE)
test
## 
##  Non-degenerate Vuong test for non-nested models
## 
## data:  turnout$group-turnout$intens
## z = 1.7759, size = 0.050000, vuong_stat = 2.084528, constant =
## 0.381107, crit-value = 2.059963, sum e.v. = -10.997224, vuong_p.value =
## 0.018556
## alternative hypothesis: different models

The Shi statistic is 1.776, which is smaller that the critical value 2.06. Therefore, based on the Shi test, we can’t reject the hypothesis that the two competing models are equivalent at the 5% level. The value of the constant c is also reported, as is the sum of the eigen values of the V matrix (sum e.v.). The classical Vuong statistic is also reported (2.085) and is greater than the 5% normal critical value (the p-value is 0.019). Therefore, the classical Vuong test and the non-degenerate version lead to opposite conclusions at the 5% level.

To get only the classical Vuong test, the nd argument can be set to FALSE:

ndvuong(turnout$group, turnout$intens, nd = FALSE)
## 
##  Vuong test for non-nested models
## 
## data:  turnout$group-turnout$intens
## z = 2.0845, p-value = 0.01856
## alternative hypothesis: different models

To get the p-value of the non-degenerate Vuong test, the pval argument should be set to TRUE.

test <- ndvuong(turnout$group, turnout$intens, pval = TRUE)
test
## 
##  Non-degenerate Vuong test for non-nested models
## 
## data:  turnout$group-turnout$intens
## z = 1.8125, vuong_stat = 2.084528, constant = 0.000000, sum e.v. =
## -10.997224, vuong_p.value = 0.018556, p-value = 0.0864
## alternative hypothesis: different models

The results indicate that the p-value is 0.086, which confirms that the Shi test concludes that the two model are equivalent at the 5% level.

Transport mode choice (nested models)

The third example concerns transport mode choice in Canada. The dataset, provided by the mlogit package is called ModeCanada and has been used extensively in the transport demand litterature (see in particular ???; ???; and ???). The following example is from (???). The raw data set is first transformed to make it suitable for the estimation of discrete choice models. The sample is restricted to the individuals for which 4 transport modes are available (bus, air, train and car).

library("mlogit")
## Le chargement a nécessité le package : dfidx
## 
## Attachement du package : 'dfidx'
## L'objet suivant est masqué depuis 'package:stats':
## 
##     filter
## 
## Attachement du package : 'mlogit'
## Les objets suivants sont masqués depuis 'package:micsr':
## 
##     rg, scoretest, stdev
data("ModeCanada")
MC <- mlogit.data(ModeCanada, subset = noalt == 4, chid.var = "case",
                  alt.var = "alt", drop.index = TRUE)

We first estimate the simplest discrete choice model, which is the multinomial logit model. The bus share being negligible, the choice set is restricted to the three other modes and the reference mode is set to car.

ml <- mlogit(choice ~ freq + cost + ivt + ovt | urban + income, MC, 
             reflevel = 'car', alt.subset = c("car", "train", "air"))

This model relies on the hypothesis that the unobserved component of the utility functions for the different modes are independent and identical Gumbell variables. (???) proposed the heteroscedastic logit for which the errors follow a general Gumbell distributions with a supplementary scale parameter to be estimated. As the overall scale of utility is not identified, the scale parameter of the reference alternative (car) is set to one.

hl <- mlogit(choice ~ freq + cost + ivt + ovt | urban + income, MC, 
             reflevel = 'car', alt.subset = c("car", "train", "air"),
             heterosc = TRUE)
coef(summary(hl))
##                       Estimate  Std. Error   z-value     Pr(>|z|)
## (Intercept):train  0.678393435 0.332762598  2.038671 4.148288e-02
## (Intercept):air    0.656754399 0.468163091  1.402832 1.606668e-01
## freq               0.063924677 0.004916769 13.001360 0.000000e+00
## cost              -0.026961457 0.004283139 -6.294789 3.078178e-10
## ivt               -0.009680773 0.001053874 -9.185892 0.000000e+00
## ovt               -0.032165526 0.003593007 -8.952258 0.000000e+00
## urban:train        0.797131578 0.120739176  6.602096 4.053868e-11
## urban:air          0.445472634 0.082160945  5.421951 5.895197e-08
## income:train      -0.012597857 0.003994180 -3.154053 1.610196e-03
## income:air         0.018859983 0.003215926  5.864558 4.503294e-09
## sp.train           1.237182865 0.110460959 11.200182 0.000000e+00
## sp.air             0.540323852 0.111835294  4.831425 1.355592e-06

The two supplementary coefficients are sp.train and sp.air. The student statistics reported are irrelevant because they test the hypothesis that these parameters are 0, as the relevant hypothesis of homoscedasticity is that both of them equal one. The heteroscedastic logit being nested in the multinomial logit model, we can first use the three classical tests: the Wald test (based on the unconstrained model hl), the score test (based on the constrained model ml) and the likelihood ratio model (based on the comparison of both models).

To perform the Wald test, we use lmtest::waldtest, for which a special method is provided by the mlogit package. The arguments are the unconstrained model (hl) and the update that should be used in order to get the constrained model (heterosc = FALSE). To compute the scoretest, we use mlogit::scoretest, for which the arguments are the constrained model (ml) and the update that should be used in order to get the unconstrained model (heterosc = TRUE). Finally, the likelihood ratio test is performed using lmtest::lrtest.

lmtest::waldtest(hl, heterosc = FALSE)
## 
##  Wald test
## 
## data:  homoscedasticity
## chisq = 25.196, df = 2, p-value = 3.38e-06
scoretest(ml, heterosc = TRUE)
## 
##  score test
## 
## data:  heterosc = TRUE
## chisq = 9.4883, df = 2, p-value = 0.008703
## alternative hypothesis: heteroscedastic model
lmtest::lrtest(hl, ml)
## Likelihood ratio test
## 
## Model 1: choice ~ freq + cost + ivt + ovt | urban + income
## Model 2: choice ~ freq + cost + ivt + ovt | urban + income
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  12 -1838.1                       
## 2  10 -1841.6 -2 6.8882    0.03193 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The three statistics are χ2 with two degrees of freedom under the null hypothesis of homoscedascticity. The three tests reject the null hypothesis at the 5% level, and even at the 1% level for the Wald and the score test. These three tests relies on the hypothesis that, under the null, the constrained model is the true model. We can get rid of this hypothesis using a Vuong test. Note the use of the nested argument that is set to TRUE:

ndvuong(hl, ml, nested = TRUE)
## 
##  Non-degenerate Vuong test for nested models
## 
## data:  hl-ml
## z = 0.4554, vuong_stat = 6.888241, vuong_p.value = 0.047927, p-value =
## 0.211
## alternative hypothesis: different models

The homoscedasticity hypothesis is still rejected at the 5% level for the classical Vuong test (the p-value is 0.048), but it is not using the non-degenerate Vuong test (p-value of 0.211).

Transport mode choice (overlapping models)

We consider finally another dataset from mlogit called RiskyTransport, that has been used by (???) and concerns the choice of one mode (among water-taxi, ferry, hovercraft and helicopter) for trips from Sierra Leone’s international airport to downtown Freetown.

library("mlogit")
data("RiskyTransport")
RT <- mlogit.data(RiskyTransport, shape = "long", choice = "choice",
                  chid.var = "chid", alt.var = "mode", id.var = "id")

We estimate models with only one covariate, the generalized cost of the mode. We estimate three models: the basic multinomial logit model, the heteroscedastic model and a nested model where alternatives are grouped in two nests according to the fact that they are fast or slow modes.

ml <- mlogit(choice ~ cost, data = RT)
hl <- mlogit(choice ~ cost , data = RT, heterosc = TRUE)
nl <- mlogit(formula = choice ~ cost, data = RT,
             nests = list(fast = c("Helicopter", "Hovercraft"),
                          slow = c("WaterTaxi", "Ferry")),
             un.nest.el = TRUE)
modelsummary::msummary(list(multinomial = ml, heteroscedastic = hl, nested =  nl))
multinomial heteroscedastic nested
(Intercept) × WaterTaxi 1.754 1.955 1.581
(0.159) (0.224) (0.127)
(Intercept) × Ferry 1.445 0.417 1.359
(0.168) (0.401) (0.118)
(Intercept) × Hovercraft 0.844 −3.666 0.461
(0.167) (2.511) (0.148)
cost −0.009 −0.022 −0.007
(0.001) (0.004) (0.001)
sp.WaterTaxi 0.745
(0.523)
sp.Ferry 2.840
(0.547)
sp.Hovercraft 4.551
(1.761)
iv 0.554
(0.092)
Num.Obs. 7172 7172 7172
AIC 3344.2 3300.7 3330.4
RMSE 0.63 0.63 0.63
mcfadden’s r2 0.150521340495322 0.163130381343293 0.154549418820887

Compared to ne multinomial model, the heteroscedastic model has 3 supplementary coefficients (the scale parameters for 3 modes, the one for the reference mode being set to 1) and the nested logit model has one supplementary parameter which is the nest elasticity (iv in the table). Both models reduce to the multinomial logit model if:

Therefore, the two models are over-lapping, as they reduce to the same model (the multinomial logit model) for some values of the parameters.

The first step of the test is the variance test. It can be performed using ndvuong by setting the argument vartest to TRUE:

ndvuong(nl, hl, vartest = TRUE)
## 
##  variance test
## 
## data:  nl-hl
## w2 = 0.047327, p-value < 2.2e-16
## alternative hypothesis: positive variance

The null hypothesis that ω2=0 is rejected. We can then proceed to the second step, which is the test for non-nested models.

ndvuong(hl, nl)
## 
##  Non-degenerate Vuong test for non-nested models
## 
## data:  hl-nl
## z = 1.7298, vuong_stat = 1.829208, constant = 0.000000, sum e.v. =
## -1.832021, vuong_p.value = 0.033684, p-value = 0.0975
## alternative hypothesis: different models

The classical and the non-degenerate Vuong tests both conclude that the two models are equivalent at the 5% level, but that the heteroscedastic model is better than the nested logit model at the 10% level.

References

Coate, Stephen, and Michael Conlin. 2004. “A Group Rule-Utilitarian Approach to Voter Turnout: Theory and Evidence.” American Economic Review 94 (5): 1476–1504.

Shi, Xiaoxia. 2015. “A Nondegenerate Vuong Test.” Quantitative Economics, 85–121.

Vuong, Quang H. 1989. “Likelihood Ratio Tests for Selection and Non-Nested Hypotheses.” Econometrica 57 (2): 397–33.


  1. β is called the pseudo-true value because f may be an uncorrect model.↩︎

  2. As the trace of V is the same as the trace of A1B, when the information matrix identity holds, it is equal to Kf+Kg. The bias of the numerator is therefore caused by the difference in the degree of parametrization of the two models.↩︎

  3. bdiag if a function that construct a block-diagonal matrix from its arguments.↩︎

  4. The estimation is rather complicated because some linear constraints are used to compute the maximum likelihood estimator in Coate and Conlin (2004)’s stata script. This is the reason why we provide only the results of the estimations, performed using the maxLik package.↩︎