Nesting, Yes or No?

Antony Unwin

2024-04-01

Introduction

Grouping variables define subgroups of the data. If there are several grouping variables they can be nested like the subsectors of sectors in the UK CPI data (the CPIuk dataset) or not nested like age, gender, area in the Northern Ireland population data (the NIpop dataset). In fact, with NIpop there can be a mixture of nesting and non-nesting. The variable area_name representing 80 District Electoral Areas is nested within the variable LGD2014_name representing 11 Local Government Districts. This is shown in the next figure.

A nested plot

library(UpAndDownPlots)
kk <- ud_prep(NIpop, v1="y2011", v2="y2017", levs=c("LGD2014_name", "area_name"), sortLev=c("perc", "perc"))
k1 <- ud_plot(kk, labelvar="LGD2014_name")
k1$uadl

Fig 1: All LGD’s increased in population over the six years and only a few areas recorded decreases. Belfast, the largest LGD, had the second lowest increase, with Derry having the lowest. The sorting of areas within districts by percentage change makes it easier to see the nesting.

A non-nested plot

km <- ud_prep(NIpop, v1="y2011", v2="y2017", levs=c("gender", "age", "LGD2014_name"), sortLev=c("perc", "orig", "perc"))
k2 <- ud_plot(km, labelvar="age")
k2$uadl

Fig 2: Changes by district within the age by gender groups. The biggest increases occurred amongst the 65+ males and the biggest decreases amongst the 16-39 females.

The districts are sorted by their overall percentages so the district order is consistent within the individual age by gender groups but the patterns are not. The increases for 65+ are uniformly high with one exception for both genders. The next graphic looks into this.

kp <- ud_prep(NIpop, v1="y2011", v2="y2017", levs=c("age", "LGD2014_name"), sortLev=c("orig", "perc"))
k3 <- ud_plot(kp, labelvar="LGD2014_name")
k3$uadl

Fig 3: Percentage changes of districts by age groups with districts sorted by overall percentage change. Belfast is very different in the 65+ group.

Nesting and non-nesting together

The two nested grouping variables, LGD2014_name and area_name, can be used together with one of the non-nested variables like gender. The package assumes that the nested variables are kept together, so that in this situation they would both come before gender or both come after. Figure 4 shows an example.

kq <- ud_prep(NIpop, v1="y2011", v2="y2017", levs=c("gender", "LGD2014_name", "area_name"), sortLev=c("orig", "perc", "perc"))
k4 <- ud_plot(kq, labelvar="LGD2014_name")
k4$uadl

Fig 4: An UpAndDown plot of areas within districts by gender, males above, females below. The district patterns are broadly similar, although the rate for females in Belfast is lower than for the males.

The patterns in this example are fairly similar, because male and female rates are fairly close. Nevertheless the district percentage changes within gender do not show a regular decline because the districts have been sorted as if they were the only level. This is to ensure that the districts are in the same order for both genders.

Double-nesting – another possible form

One grouping variable can be separately nested in each of two other grouping variables (“double-nesting”). Take the Car sector of the AutoSales dataset and remove the X4 and OTHER models. The variable Model is then nested in both Segment and Manufacturer. The filtering is necessary because otherwise a few model names are recorded in more than one segment and then the variables appear not to be nested.

The package assumes that any double-nested variable is always last to be drawn and is nested inside the immediately preceding grouping variable.