\documentclass{chapman}
%%% copy Sweave.sty definitions
%%% keeps `sweave' from adding `\usepackage{Sweave}': DO NOT REMOVE
%\usepackage{Sweave}
\RequirePackage[T1]{fontenc}
\RequirePackage{graphicx,ae,fancyvrb}
\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}
\usepackage{relsize}
\DefineVerbatimEnvironment{Sinput}{Verbatim}{}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontfamily=courier,
fontshape=it,
fontsize=\relsize{-1}}
\DefineVerbatimEnvironment{Scode}{Verbatim}{}
\newenvironment{Schunk}{}{}
%%% environment for raw output
\newcommand{\SchunkRaw}{\renewenvironment{Schunk}{}{}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontfamily=courier,
fontshape=it,
fontsize=\small}
\rawSinput
}
%%% environment for labeled output
\newcommand{\nextcaption}{}
\newcommand{\SchunkLabel}{
\renewenvironment{Schunk}{\begin{figure}[ht] }{\caption{\nextcaption}
\end{figure} }
\DefineVerbatimEnvironment{Sinput}{Verbatim}{frame = topline}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{frame = bottomline,
samepage = true,
fontfamily=courier,
fontshape=it,
fontsize=\relsize{-1}}
}
%%% S code with line numbers
\DefineVerbatimEnvironment{Sinput}
{Verbatim}
{
%% numbers=left
}
\newcommand{\numberSinput}{
\DefineVerbatimEnvironment{Sinput}{Verbatim}{numbers=left}
}
\newcommand{\rawSinput}{
\DefineVerbatimEnvironment{Sinput}{Verbatim}{}
}
%%% R / System symbols
\newcommand{\R}{\textsf{R}}
\newcommand{\rR}{{R}}
\renewcommand{\S}{\textsf{S}}
\newcommand{\SPLUS}{\textsf{S-PLUS}}
\newcommand{\rSPLUS}{{S-PLUS}}
\newcommand{\SPSS}{\textsf{SPSS}}
\newcommand{\EXCEL}{\textsf{Excel}}
\newcommand{\ACCESS}{\textsf{Access}}
\newcommand{\SQL}{\textsf{SQL}}
%%\newcommand{\Rpackage}[1]{\hbox{\rm\textit{#1}}}
%%\newcommand{\Robject}[1]{\hbox{\rm\texttt{#1}}}
%%\newcommand{\Rclass}[1]{\hbox{\rm\textit{#1}}}
%%\newcommand{\Rcmd}[1]{\hbox{\rm\texttt{#1}}}
\newcommand{\Rpackage}[1]{\index{#1 package@{\fontseries{b}\selectfont #1} package} {\fontseries{b}\selectfont #1}}
\newcommand{\rpackage}[1]{{\fontseries{b}\selectfont #1}}
\newcommand{\Robject}[1]{\texttt{#1}}
\newcommand{\Rclass}[1]{\index{#1 class@\textit{#1} class}\textit{#1}}
\newcommand{\Rcmd}[1]{\index{#1 function@\texttt{#1} function}\texttt{#1}}
\newcommand{\Roperator}[1]{\texttt{#1}}
\newcommand{\Rarg}[1]{\texttt{#1}}
\newcommand{\Rlevel}[1]{\texttt{#1}}
%%% other symbols
\newcommand{\file}[1]{\hbox{\rm\texttt{#1}}}
%%\newcommand{\stress}[1]{\index{#1}\textit{#1}}
\newcommand{\stress}[1]{\textit{#1}}
\newcommand{\booktitle}[1]{\textit{#1}} %%'
%%% Math symbols
\usepackage{amstext}
\usepackage{amsmath}
\newcommand{\E}{\mathsf{E}}
\newcommand{\Var}{\mathsf{Var}}
\newcommand{\Cov}{\mathsf{Cov}}
\newcommand{\Cor}{\mathsf{Cor}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\y}{\mathbf{y}}
\renewcommand{\a}{\mathbf{a}}
\newcommand{\W}{\mathbf{W}}
\newcommand{\C}{\mathbf{C}}
\renewcommand{\H}{\mathbf{H}}
\newcommand{\X}{\mathbf{X}}
\newcommand{\B}{\mathbf{B}}
\newcommand{\V}{\mathbf{V}}
\newcommand{\I}{\mathbf{I}}
\newcommand{\D}{\mathbf{D}}
\newcommand{\bS}{\mathbf{S}}
\newcommand{\N}{\mathcal{N}}
\renewcommand{\L}{L}
\renewcommand{\P}{\mathsf{P}}
\newcommand{\K}{\mathbf{K}}
\newcommand{\m}{\mathbf{m}}
\newcommand{\argmin}{\operatorname{argmin}\displaylimits}
\newcommand{\argmax}{\operatorname{argmax}\displaylimits}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bbeta}{\mathbf{\beta}}
%%% links
\usepackage{hyperref}
\hypersetup{%
pdftitle = {A Handbook of Statistical Analyses Using R (3rd Edition)},
pdfsubject = {Book},
pdfauthor = {Torsten Hothorn and Brian S. Everitt},
colorlinks = {black},
linkcolor = {black},
citecolor = {black},
urlcolor = {black},
hyperindex = {true},
linktocpage = {true},
}
%%% captions & tables
%% : conflics with figure definition in chapman.cls
%%\usepackage[format=hang,margin=10pt,labelfont=bf]{caption}
%%
\usepackage{longtable}
\usepackage[figuresright]{rotating}
%%% R symbol in chapter 1
\usepackage{wrapfig}
%%% Bibliography
\usepackage[round,comma]{natbib}
\renewcommand{\refname}{References \addcontentsline{toc}{chapter}{References}}
\citeindexfalse
%%% texi2dvi complains that \newblock is undefined, hm...
\def\newblock{\hskip .11em plus .33em minus .07em}
%%% Example sections
\newcounter{exercise}[chapter]
\setcounter{exercise}{0}
\newcommand{\exercise}{\stepcounter{exercise} \item{Ex.~\arabic{chapter}.\arabic{exercise} }}
%% URLs
\newcommand{\curl}[1]{\begin{center} \url{#1} \end{center}}
%%% for manual corrections
%\renewcommand{\baselinestretch}{2}
%%% plot sizes
\setkeys{Gin}{width=0.95\textwidth}
%%% color
\usepackage{color}
%%% hyphenations
\hyphenation{drop-out}
\hyphenation{mar-gi-nal}
%%% new bidirectional quotes need
\usepackage[utf8]{inputenc}
%\usepackage{setspace}
\definecolor{sidebox_todo}{rgb}{1,1,0.2}
\newcommand{\todo}[1]{
\hspace{0pt}%
\marginpar{%
\fcolorbox{black}{sidebox_todo}{%
\parbox{\marginparwidth} {
\raggedright\sffamily\footnotesize{TODO: #1}%
}
}%
}
}
\begin{document}
%% Title page
\title{A Handbook of Statistical Analyses Using \R{} --- 3rd Edition}
\author{Torsten Hothorn and Brian S. Everitt}
\maketitle
%%\VignetteIndexEntry{Chapter Quantile Regression}
%%\VignetteDepends{lattice,quantreg}
\setcounter{chapter}{11}
\SweaveOpts{prefix.string=figures/HSAUR,eps=FALSE,keep.source=TRUE}
<>=
rm(list = ls())
s <- search()[-1]
s <- s[-match(c("package:base", "package:stats", "package:graphics", "package:grDevices",
"package:utils", "package:datasets", "package:methods", "Autoloads"), s)]
if (length(s) > 0) sapply(s, detach, character.only = TRUE)
if (!file.exists("tables")) dir.create("tables")
if (!file.exists("figures")) dir.create("figures")
set.seed(290875)
options(prompt = "R> ", continue = "+ ",
width = 63, # digits = 4,
show.signif.stars = FALSE,
SweaveHooks = list(leftpar = function()
par(mai = par("mai") * c(1, 1.05, 1, 1)),
bigleftpar = function()
par(mai = par("mai") * c(1, 1.7, 1, 1))))
HSAURpkg <- require("HSAUR3")
if (!HSAURpkg) stop("cannot load package ", sQuote("HSAUR3"))
rm(HSAURpkg)
### hm, R-2.4.0 --vanilla seems to need this
a <- Sys.setlocale("LC_ALL", "C")
###
book <- TRUE
refs <- cbind(c("AItR", "DAGD", "SI", "CI", "ANOVA", "MLR", "GLM",
"DE", "RP", "GAM", "SA", "ALDI", "ALDII", "SIMC", "MA", "PCA",
"MDS", "CA"), 1:18)
ch <- function(x) {
ch <- refs[which(refs[,1] == x),]
if (book) {
return(paste("Chapter~\\\\ref{", ch[1], "}", sep = ""))
} else {
return(paste("Chapter~", ch[2], sep = ""))
}
}
if (file.exists("deparse.R"))
source("deparse.R")
setHook(packageEvent("lattice", "attach"), function(...) {
lattice.options(default.theme =
function()
standard.theme("pdf", color = FALSE))
})
@
\pagestyle{headings}
<>=
book <- FALSE
@
%% lower png resolution for vignettes
\SweaveOpts{resolution = 80}
<>=
library("lattice")
trellis.par.set(list(plot.symbol = list(col=1,pch=20, cex=0.7),
box.rectangle = list(col=1),
plot.line = list(col = 1, lwd = 1),
box.umbrella = list(lty=1, col=1),
strip.background = list(col = "white")))
ltheme <- canonical.theme(color = FALSE) ## in-built B&W theme
ltheme$strip.background$col <- "transparent" ## change strip bg
lattice.options(default.theme = ltheme)
data("db", package = "gamlss.data")
nboys <- with(db, sum(age > 2))
@
\chapter[Quantile Regression]{Quantile Regression:
Head Circumference for Age\label{QR}}
\section{Introduction}
\section{Quantile Regression}
\section{Analysis Using \R{}}
We begin with a graphical inspection of the influence of age on head
circumference by means of a scatterplot. Plotting all pairs of age and
head circumference in one panel gives more weight to the teens and 20s,
so we produce one plot for younger boys between two and nine years old and
one additional plot for boys older than nine years (or $>108$ months,
to be precise). The \Rcmd{cut} function is very convenient for constructing
a factor representing these two groups
<>=
summary(db)
db$cut <- cut(db$age, breaks = c(2, 9, 23),
labels = c("2-9 yrs", "9-23 yrs"))
@
which can then be used as a
conditioning variable for conditional scatterplots produced with the
\Rcmd{xyplot} function \citep[package \Rpackage{lattice}]{PKG:lattice}.
Because we draw $\Sexpr{nboys}$ points in total, we use transparent shading
(via \Rcmd{rgb(.1, .1, .1, .1)}) in order to obtain a clearer picture
for the more populated areas in the plot.
\begin{figure}
\begin{center}
<>=
db$cut <- cut(db$age, breaks = c(2, 9, 23),
labels = c("2-9 yrs", "9-23 yrs"))
xyplot(head ~ age | cut, data = db, xlab = "Age (years)",
ylab = "Head circumference (cm)",
scales = list(x = list(relation = "free")),
layout = c(2, 1), pch = 19,
col = rgb(.1, .1, .1, .1))
@
\caption{Scatterplot of age and head circumference for $\Sexpr{nboys}$
Dutch boys. \label{QR-db-plot}}
\end{center}
\end{figure}
Figure~\ref{QR-db-plot}, as expected, shows that head circumference
increases with age. It also shows that there is considerable variation and
also quite a number of extremely large or small head circumferences in the
respective age cohorts. It should be noted that each point corresponds to
one boy participating in the study due to its cross-sectional study design.
No longitudinal measurements (cf.~Chapter~\ref{ALDI}) were taken and we can
safely assume independence between observations.
We start with a simple linear model, computed separately for the younger and
older boys, for regressing the mean head circumference on age
<>=
(lm2.9 <- lm(head ~ age, data = db, subset = age < 9))
(lm9.23 <- lm(head ~ age, data = db, subset = age > 9))
@
This approach is equivalent to fitting two intercepts and two slopes in
the joint model
<>=
(lm_mod <- lm(head ~ age:I(age < 9) + I(age < 9) - 1,
data = db))
@
while omitting the global intercept. Because the median of the
normal distribution is equal to its mean, the two models can be interpreted
as conditional median models under the normal assumption. The model
states that within one year, the head circumference increases by
$\Sexpr{round(coef(lm_mod)["age:I(age < 9)TRUE"], 3)}$ cm for boys less than nine years old
and by $\Sexpr{round(coef(lm_mod)["age:I(age < 9)FALSE"], 3)}$ for older boys.
We now relax this distributional assumption and compute a median
regression model using the \Rcmd{rq} function from package \Rpackage{quantreg}
\citep{PKG:quantreg}:
<>=
library("quantreg")
(rq_med2.9 <- rq(head ~ age, data = db, tau = 0.5,
subset = age < 9))
(rq_med9.23 <- rq(head ~ age, data = db, tau = 0.5,
subset = age > 9))
@
When we construct confidence intervals for the
intercept and slope parameters from both models for the younger boys
<>=
cbind(coef(lm2.9)[1], confint(lm2.9, parm = "(Intercept)"))
cbind(coef(lm2.9)[2], confint(lm2.9, parm = "age"))
summary(rq_med2.9, se = "rank")
@
we see that the two intercepts are almost identical but there seems to be
a larger slope parameter for age in the median regression model.
For the older boys, we get the confidence intervals via
<>=
cbind(coef(lm9.23)[1], confint(lm9.23, parm = "(Intercept)"))
cbind(coef(lm9.23)[2], confint(lm9.23, parm = "age"))
summary(rq_med9.23, se = "rank")
@
with again almost identical intercepts and only a slightly increased
slope for age in the median regression model.
Since one of our aims was the construction of growth curves, we first use the
linear models regressing head circumference on age to plot such curves.
Based on the two normal linear models, we can compute the quantiles of head
circumference for age. For the following values of $\tau$
<>=
tau <- c(.01, .1, .25, .5, .75, .9, .99)
@
and a grid of age values
<>=
gage <- c(2:9, 9:23)
i <- 1:8
@
(the index \Rcmd{i} denoting younger boys), we compute the
standard prediction intervals
\index{Prediction interval}
taking the randomness of the estimated
intercept, slope, and variance parameters into account. We first set up
a data frame with our grid of age values and then use the \Rcmd{predict}
function for a linear model to compute prediction intervals, here
with a coverage of $50\%$. The lower limit of such a $50\%$ prediction
interval is equivalent to the conditional $25\%$ quantile for the given age
and the upper limit corresponds to the $75\%$ quantile. The conditional
mean is also reported and is equivalent to the conditional median:
<>=
idf <- data.frame(age = gage[i])
p <- predict(lm2.9, newdata = idf, level = 0.5,
interval = "prediction")
colnames(p) <- c("0.5", "0.25", "0.75")
p
@
We now proceed with $80\%$ prediction intervals for constructing the
$10\%$ and $90\%$ quantiles, and with $98\%$ prediction intervals
corresponding to the $1\%$ and $99\%$ quantiles and repeat the exercise
also for the older boys:
<>=
p <- cbind(p, predict(lm2.9, newdata = idf, level = 0.8,
interval = "prediction")[,-1])
colnames(p)[4:5] <- c("0.1", "0.9")
p <- cbind(p, predict(lm2.9, newdata = idf, level = 0.98,
interval = "prediction")[,-1])
colnames(p)[6:7] <- c("0.01", "0.99")
p2.9 <- p[, c("0.01", "0.1", "0.25", "0.5",
"0.75", "0.9", "0.99")]
idf <- data.frame(age = gage[-i])
p <- predict(lm9.23, newdata = idf, level = 0.5,
interval = "prediction")
colnames(p) <- c("0.5", "0.25", "0.75")
p <- cbind(p, predict(lm9.23, newdata = idf, level = 0.8,
interval = "prediction")[,-1])
colnames(p)[4:5] <- c("0.1", "0.9")
p <- cbind(p, predict(lm9.23, newdata = idf, level = 0.98,
interval = "prediction")[,-1])
colnames(p)[6:7] <- c("0.01", "0.99")
@
We now reorder the columns of this table and get the following
conditional quantiles, estimated under the normal assumption of
head circumference:
<>=
p9.23 <- p[, c("0.01", "0.1", "0.25", "0.5",
"0.75", "0.9", "0.99")]
round((q2.23 <- rbind(p2.9, p9.23)), 3)
@
We can now superimpose these conditional quantiles on our scatterplot.
To do this, we need to write our own little panel function that
produces the scatterplot using the \Rcmd{panel.xyplot} function and then
adds the just computed conditional quantiles by means of the
\Rcmd{panel.lines} function called for every column of $\Robject{q2.23}$.
Figure~\ref{QR-db-lm-plot} shows parallel lines owing to the fact that the
linear model assumes an error variance independent from age; this is the
so-called variance homogeneity. Compared to a plot with only a single
(mean) regression line, we plotted a whole bunch of conditional
distributions here, one for each value of age. Of course, we did so
under extremely simplifying assumptions like linearity and variance
homogeneity that we're going to drop now.
\begin{figure}
\begin{center}
<>=
pfun <- function(x, y, ...) {
panel.xyplot(x = x, y = y, ...)
if (max(x) <= 9) {
apply(q2.23, 2, function(x)
panel.lines(gage[i], x[i]))
} else {
apply(q2.23, 2, function(x)
panel.lines(gage[-i], x[-i]))
}
panel.text(rep(max(db$age), length(tau)),
q2.23[nrow(q2.23),], label = tau, cex = 0.9)
panel.text(rep(min(db$age), length(tau)),
q2.23[1,], label = tau, cex = 0.9)
}
xyplot(head ~ age | cut, data = db, xlab = "Age (years)",
ylab = "Head circumference (cm)", pch = 19,
scales = list(x = list(relation = "free")),
layout = c(2, 1), col = rgb(.1, .1, .1, .1),
panel = pfun)
@
\caption{Scatterplot of age and head circumference for $\Sexpr{nboys}$
Dutch boys with superimposed normal quantiles.
\label{QR-db-lm-plot}}
\end{center}
\end{figure}
For the production of a nonparametric version of our growth curves,
we start with fitting not only one but multiple quantile regression models,
one for each value of $\tau$. We start with the younger boys
<>=
(rq2.9 <- rq(head ~ age, data = db, tau = tau,
subset = age < 9))
@
and continue with the older boys
<>=
(rq9.23 <- rq(head ~ age, data = db, tau = tau,
subset = age > 9))
@
Naturally, the intercept parameters vary but there is also a considerable
variation in the slopes, with the largest value
for the $1\%$ quantile regression model for younger boys.
The parameters $\beta_\tau$ have to be interpreted with care.
In general, they cannot be interpreted on an individual-specific level. A
boy who happens to be at the $\tau \times 100\%$ quantile of head circumference
conditional on his age would not be at the same
quantile anymore when he gets older. When knowing
$\beta_\tau$, the only conclusion that can be drawn is how the $\tau
\times 100\%$ quantile of a population with a specific age
differs from the $\tau \times 100\%$ quantile of a population with a
different age.
Because the linear functions estimated by linear quantile regression,
here in model \Robject{rq9.23}, directly correspond to the conditional
quantiles of interest, we can use the \Rcmd{predict} function to compute
the estimated conditional quantiles:
<>=
p2.23 <- rbind(predict(rq2.9,
newdata = data.frame(age = gage[i])),
predict(rq9.23,
newdata = data.frame(age = gage[-i])))
@
It is important to note that these numbers were obtained without assuming
anything about the continuous distribution of head circumference given any
age. Again, we produce a scatterplot with superimposed quantiles, this time
each line corresponds to a specific model. For the sake of comparison with
the linear model, we add the linear model quantiles as dashed lines to
Figure~\ref{QR-db-rq-plot}. For the older boys, there seems to be almost no
difference but the more extreme $1\%$ and $99\%$ quantiles for the younger
boys differ considerably. So, at least for the younger boys, we might want
to allow for age-specific variability in the distribution of head
circumference.
\begin{figure}
\begin{center}
<>=
pfun <- function(x, y, ...) {
panel.xyplot(x = x, y = y, ...)
if (max(x) <= 9) {
apply(q2.23, 2, function(x)
panel.lines(gage[i], x[i], lty = 2))
apply(p2.23, 2, function(x)
panel.lines(gage[i], x[i]))
} else {
apply(q2.23, 2, function(x)
panel.lines(gage[-i], x[-i], lty = 2))
apply(p2.23, 2, function(x)
panel.lines(gage[-i], x[-i]))
}
panel.text(rep(max(db$age), length(tau)),
p2.23[nrow(p2.23),], label = tau, cex = 0.9)
panel.text(rep(min(db$age), length(tau)),
p2.23[1,], label = tau, cex = 0.9)
}
xyplot(head ~ age | cut, data = db, xlab = "Age (years)",
ylab = "Head circumference (cm)", pch = 19,
scales = list(x = list(relation = "free")),
layout = c(2, 1), col = rgb(.1, .1, .1, .1),
panel = pfun)
@
\caption{Scatterplot of age and head circumference for $\Sexpr{nboys}$
Dutch boys with superimposed regression quantiles (solid lines) and
normal quantiles (dashed lines). \label{QR-db-rq-plot}}
\end{center}
\end{figure}
Still, with the quantile regression models shown in
Figure~\ref{QR-db-rq-plot} we assume that the quantiles of head
circumference depend on age in a linear way. Additive quantile regression
is one way to approach the estimation of non-linear quantile functions. By
considering two different models for younger and older boys, we allowed
for a certain type of non-linear function in the results shown so far.
Additive quantile regression should be able to deal with this problem
and we therefore fit these models to all boys simultaneously.
For our different choices of $\tau$, we fit one additive quantile regression model
using the \Rcmd{rqss} function from the \Rpackage{quantreg} and allow
smooth quantile functions of age via the \Rcmd{qss} function in the
right-hand side of the model formula.
Note that we transformed age by the third root prior to model fitting. This
does not affect the model since it is a monotone transformation, however, it
helps to avoid fitting a function with large derivatives for very young
boys resulting in a low penalty parameter $\lambda$:
<>=
rqssmod <- vector(mode = "list", length = length(tau))
db$lage <- with(db, age^(1/3))
for (i in 1:length(tau))
rqssmod[[i]] <- rqss(head ~ qss(lage, lambda = 1),
data = db, tau = tau[i])
@
For the analysis of the head circumference, we choose a penalty parameter
$\lambda = 1$, which is the default for the \Rcmd{qss} function. Simply
using the default without a careful hyperparameter tuning, for example using
crossvalidation or similar procedures, is almost always a mistake.
By visual inspection (Figure~\ref{QR-db-rqss-plot}) we find this choice
appropriate but ask the readers to make a second guess (Exercise 3).
For a finer grid of age values, we compute the conditional quantiles
from the \Rcmd{predict} function:
<>=
gage <- seq(from = min(db$age), to = max(db$age),
length = 50)
p <- sapply(1:length(tau), function(i) {
predict(rqssmod[[i]],
newdata = data.frame(lage = gage^(1/3)))
})
@
Using very similar code as for plotting linear quantiles, we produce again a
scatterplot of age and head circumference but this time overlaid with
non-linear regression quantiles. Given that the results from the linear
models presented in Figure~\ref{QR-db-rq-plot} looked pretty convincing, the
quantile curves in Figure~\ref{QR-db-rqss-plot} shed a surprising new light
on the data. For the younger boys, we expected to see a larger variability
than for boys between two and three years old, but in fact the distribution seems
to be more complex. The distribution seems to be positively skewed with a
heavy lower tail and the degree of skewness varies with age (note that
the median is almost linear for boys older than four years).
Also in the right part of Figure~\ref{QR-db-rqss-plot}, we see an age-varying
skewness, although less pronounced as for the younger boys. The median
increases up to 16 years but then the growth rate is much smaller. This
does not seem to be the case for the $1\%, 10\%, 90\%$, and $99\%$
quantiles. Note that the discontinuity in the quantiles between the two age
groups is only due to the overlapping abscissae.
However, the deviations between the growth curves obtained from a linear
model under normality assumption on the one hand and quantile regression on the
other hand as shown in Figures~\ref{QR-db-rq-plot} and \ref{QR-db-rqss-plot}
are hardly dramatic for the head circumference data.
\begin{figure}
\begin{center}
<>=
pfun <- function(x, y, ...) {
panel.xyplot(x = x, y = y, ...)
apply(p, 2, function(x) panel.lines(gage, x))
panel.text(rep(max(db$age), length(tau)),
p[nrow(p),], label = tau, cex = 0.9)
panel.text(rep(min(db$age), length(tau)),
p[1,], label = tau, cex = 0.9)
}
xyplot(head ~ age | cut, data = db, xlab = "Age (years)",
ylab = "Head circumference (cm)", pch = 19,
scales = list(x = list(relation = "free")),
layout = c(2, 1), col = rgb(.1, .1, .1, .1),
panel = pfun)
@
\caption{Scatterplot of age and head circumference for $\Sexpr{nboys}$
Dutch boys with superimposed non-linear regression quantiles.
\label{QR-db-rqss-plot}}
\end{center}
\end{figure}
\section{Summary of Findings}
We can conclude that the whole distribution of head circumference changes
with age and that assumptions like symmetry and variance homogeneity might be
questionable for such type of analysis.
One alternative to the estimation of conditional quantiles is the estimation
of conditional distributions. One very interesting parametric approach are
generalized additive models for location, scale, and shape
\citep[GAMLSS,][]{HSAUR:RigbyStasinopoulos2005}. In
\cite{HSAUR:StasinopoulosRigby2007}, an analysis of the age and head
circumference by means of the \Rpackage{gamlss} package can be found.
One practical problem associated with contemporary methods in quantile
regression is quantile crossing. Because we fitted one quantile regression model
for each of the quantiles of interest, we cannot guarantee that the conditional
quantile functions are monotone, so the $90\%$ quantile may well be larger than the
$95\%$ quantile in some cases. Postprocessing of the estimated quantile curves
may help in this situation \citep{HSAUR:DetteVolgushev2008}.
\section{Final Comments}
When estimating regression models, we have to be aware of the implications
of model assumptions when interpreting the results. Symmetry, linearity,
and variance homogeneity are among the strongest but common assumptions.
Quantile regression, both in its linear and additive formulation, is an
intellectually stimulating and practically very useful framework where such
assumptions can be relaxed. At a more basic level, one should always ask
\stress{Am I really interested in the mean?} before using the regression models
discussed in other chapters of this book.
\bibliographystyle{LaTeXBibTeX/refstyle}
\bibliography{LaTeXBibTeX/HSAUR}
\end{document}