Introduction to the hicp-package

Sebastian Weinand

22 July 2024



The Harmonised Index of Consumer Prices (HICP) is the key economic figure to measure inflation in the euro area. The methodology underlying the HICP is documented in the HICP Methodological Manual (European Commission 2024). Based on this manual, the hicp-package provides functions for data users to work with publicly available HICP price indices and weights (upper-level aggregation). The following vignette highlights the main package features. It contains three sections on data access, the Classification of Consumption by Individual Purpose (COICOP) underlying the HICP, and index aggregation.

# load package:
library(hicp)

# load additional packages:
library(data.table)

# set global options:
options(hicp.coicop.version="ecoicop-hicp")   # the coicop version to be used
options(hicp.unbundle=TRUE)                   # treatment of coicop bundle codes like 08X
options(hicp.all.items.code="00")             # internal code for the all-items index

HICP data

The hicp-package offers easy access to HICP data from Eurostat’s public database. For that purpose, it uses the download functionality provided by Eurostat’s restatapi-package. This section shows how to list, filter and retrieve HICP data using the functions hicp.datasets(), hicp.datafilters(), and hicp.dataimport().

Step 1: Available datasets

Eurostat’s database contains various datasets of different statistics. All datasets are classified by topic and can be accessed via a navigation tree. HICP data can be found under “Economy and finance / Prices”. An even simpler solution that does not require visiting Eurostat’s database is provided by the function hicp.datasets(), which lists all available HICP datasets with corresponding metadata (e.g., number of observations, last update).

dtd <- hicp.datasets()
dtd[1:5, list(title, code, lastUpdate, values)]
#>                                                                 title
#>                                                                <char>
#> 1:                  HICP at constant tax rates - monthly data (index)
#> 2:  HICP at constant tax rates - monthly data (annual rate of change)
#> 3: HICP at constant tax rates - monthly data (monthly rate of change)
#> 4:                           HICP - administered prices (composition)
#> 5:                                             HICP - country weights
#>             code lastUpdate  values
#>           <char>     <char>   <num>
#> 1: prc_hicp_cind 2024.07.17 2111411
#> 2: prc_hicp_cann 2024.07.17 1934261
#> 3: prc_hicp_cmon 2024.07.17 4193574
#> 4:  prc_hicp_apc 2024.07.17  713367
#> 5:  prc_hicp_cow 2024.02.22    2921

The output above shows the first five HICP datasets. As can be seen, a short description of each dataset and some metadata are provided. The variable code is the dataset identifier, which is needed to filter and download data.

Step 2: Allowed data filters

The HICP is compiled each month in each member state of the European Union (EU) for various items. Its compilation started in 1996. Therefore, the dataset of price indices is relatively large. Sometimes, however, data users only need the price indices of certain years or specific countries. Eurostat’s API and, thus, the restatapi-package allows to provide filters on each data request, e.g., to download only the price indices of the euro area for the all-items HICP. The filtering options can differ for each dataset. Therefore, function hicp.datafilters() returns the allowed filtering options for a given dataset.

# dataset 'prc_hicp_inw':
dtf <- hicp.datafilters(id="prc_hicp_inw")

# allowed filters:
unique(dtf$concept)
#> [1] "freq"   "coicop" "geo"

# allowed filter values:
dtf[1:5,]
#>    concept   code                             name
#>     <char> <char>                           <char>
#> 1:    freq      A                           Annual
#> 2:  coicop   CP00                   All-items HICP
#> 3:  coicop   CP01 Food and non-alcoholic beverages
#> 4:  coicop  CP011                             Food
#> 5:  coicop CP0111                Bread and cereals

The output above shows that the dataset prc_hicp_inw on item weights can be filtered according to freq, coicop, and geo. The table dtf contains for each filter the allowed values, e.g., CP011 for coicop and A for freq. These filters can be integrated in the data download as explained in the following subsection.

Step 3: Data download

Applying a filter to a data request can noticeably reduce the downloading time, particularly for bigger datasets. Function hicp.dataimport() can be used to download a specific dataset.

# download item weights for euro area from 2015 on:
item.weights <- hicp.dataimport(id="prc_hicp_inw", filters=list("geo"=c("EA","DE","FR")), date.range=c("2015", NA), flags=TRUE)

# inspect data:
item.weights[1:5, ]
#> Key: <coicop, geo, time>
#>    coicop    geo   time values  flags
#>    <char> <char> <char>  <num> <char>
#> 1:     AP     DE   2015 141.49      b
#> 2:     AP     DE   2016 146.47   <NA>
#> 3:     AP     DE   2017 141.30   <NA>
#> 4:     AP     DE   2018 139.96   <NA>
#> 5:     AP     DE   2019 141.78   <NA>
nrow(item.weights) # number of observations
#> [1] 13412
unique(item.weights$geo) # only EA, DE, and FR
#> [1] "DE" "EA" "FR"
range(item.weights$time) # since 2015
#> [1] "2015" "2024"

The object dt contains the item weights for the euro area since 2015. If one would have wanted the whole dataset, the request would simplify to hicp.dataimport(id="prc_hicp_inw").

HICP and COICOP

HICP item weights and price indices are classified according to the European COICOP (ECOICOP-HICP). This COICOP version is used by default (options(hicp.coicop.version="ecoicop-hicp")) but others are available in the package as well. The all-items HICP includes twelve item divisions, which are further broken down by consumption purpose. At the lowest level of subclasses (5-digit codes), there is the finest differentiation of items for which weights are available, e.g., rice (01111) or bread (01113). Both rice and bread belong to the same class, bread and cereals (0111), and, at higher levels, to the same group food (011) and division food and non-alcoholic beverages (01). Hence, ECOICOP and thus also the HICP follows a pre-defined hierarchical tree, where the item weights of the all-items HICP add up to 1000. This section shows how to work with the COICOP codes to derive for example the lowest level of items that form the all-items HICP.

COICOP codes, bundles, and relatives

COICOP codes and bundles. The COICOP codes underlying the HICP (ECOICOP) consist of numbers. The code 00 is used in this package for the all-items HICP although it is no official COICOP code (see options(hicp.all.items.code="00")). The codes of the twelve divisions below start with 01, 02,..., 12. At the lowest level of subclasses, the codes consist of 5 digits. Using the function is.coicop(), it can be easily checked if a code is a valid COICOP code or not. This includes bundle codes like 082_083, which violate the standard COICOP code pattern, but can be found in HICP data. Bundle codes can be generally detected using is.bundle() and be unbundled using function unbundle().

# example codes:
ids <- c("00","CP00","13","08X")

# check for bundle codes:
is.bundle(id=ids)
#> [1] FALSE FALSE FALSE  TRUE

# unbundle any bundle codes into their components:
unbundle(id=ids)
#>     00   CP00     13    08X    08X 
#>   "00" "CP00"   "13"  "082"  "083"

# check if valid ECOICOP code including bundle codes:
is.coicop(id=ids, settings=list(unbundle=TRUE))
#> [1] FALSE FALSE FALSE  TRUE

# check if valid ECOICOP code excluding bundle codes:
is.coicop(id=ids, settings=list(unbundle=FALSE))
#> [1] FALSE FALSE FALSE FALSE

# games of chance have a valid ECOICOP code:
is.coicop("0943", settings=list(coicop.version="ecoicop"))
#> [1] TRUE
# but not in the ECOICOP-HICP:
is.coicop("0943", settings=list(coicop.version="ecoicop-hicp"))
#> [1] FALSE

COICOP relatives. COICOP codes available in the data downloaded from Eurostat’s database should be generally valid (except for the prefix “CP”). More relevant is thus the detection of children and parent codes in the data. Children are those codes that belong to the same higher-level code (or parent). Such relations can be direct (e.g., 01->011) or indirect (e.g., 01->0111). It is important to note that children exhibit exactly one parent, while a parent may contain multiple children. This can be seen in the example below.

# example codes:
ids <- c("00","01","011","01111","01112")

# no direct parent for 01111 and 01112:
parent(id=ids, flag=FALSE, direct=TRUE)
#> [1] NA   "00" "01" NA   NA

# indirect parent available:
parent(id=ids, flag=FALSE, direct=FALSE)
#> [1] NA    "00"  "01"  "011" "011"

# 011 has two (indirect) childs:
child(id=ids, flag=FALSE, direct=FALSE)
#> [[1]]
#> [1] "01"    "011"   "01111" "01112"
#> 
#> [[2]]
#> [1] "011"   "01111" "01112"
#> 
#> [[3]]
#> [1] "01111" "01112"
#> 
#> [[4]]
#> character(0)
#> 
#> [[5]]
#> character(0)

Deriving the COICOP tree for index aggregation

The functions child() and parents() may be useful for various reasons. To derive the composition of COICOP codes at the lowest possible level, however, the function tree() is better suited. For the HICP, the derivation of this composition can be done separately for each reporting month and country. Consequently, the selection of COICOP codes may differ across space and time. If needed, however, specifying argument by in function tree() allows to merge the composition of COICOP codes at the lowest possible level, e.g., to obtain a unique selection of the same COICOP codes over time. Because the derivation of COICOP codes searches in the whole COICOP tree, the resulting composition of COICOP codes is also denoted as the COICOP tree in this package.

# subset and adjust item weights table:
item.weights <- item.weights[grepl("^CP", coicop),]
item.weights[, "coicop":=gsub(pattern="^CP", replacement="", x=coicop)]

# derive separate trees for each time period and country:
item.weights[, "t1" := tree(id=coicop, w=values, settings=list(w.tol=0.1)), by=c("geo","time")]
item.weights[t1==TRUE,
        list("n"=uniqueN(coicop),           # varying coicops over time and space
             "w"=sum(values, na.rm=TRUE)),  # weight sums should equal 1000
        by=c("geo","time")]
#>        geo   time     n       w
#>     <char> <char> <int>   <num>
#>  1:     EA   2015    93  999.97
#>  2:     EA   2016    93  999.97
#>  3:     DE   2015   295 1000.00
#>  4:     DE   2016   295 1000.00
#>  5:     DE   2017   295 1000.00
#>  6:     DE   2018   295 1000.00
#>  7:     DE   2019   295 1000.00
#>  8:     DE   2020   295 1000.00
#>  9:     DE   2021   295 1000.00
#> 10:     DE   2022   295 1000.00
#> 11:     DE   2023   295 1000.00
#> 12:     DE   2024   295 1000.00
#> 13:     EA   2017   295 1000.11
#> 14:     EA   2018   295 1000.05
#> 15:     EA   2019   295 1000.00
#> 16:     EA   2020   295  999.95
#> 17:     EA   2021   295  999.91
#> 18:     EA   2022   295 1000.08
#> 19:     EA   2023   295  999.98
#> 20:     EA   2024   295  999.98
#> 21:     FR   2015   284  999.94
#> 22:     FR   2016   295  999.91
#> 23:     FR   2017   295 1000.01
#> 24:     FR   2018   295 1000.00
#> 25:     FR   2019   295 1000.04
#> 26:     FR   2020   295  999.99
#> 27:     FR   2021   295  999.94
#> 28:     FR   2022   295 1000.01
#> 29:     FR   2023   295 1000.04
#> 30:     FR   2024   295  999.95
#>        geo   time     n       w

# derive merged trees over time, but not across countries:
item.weights[, "t2" := tree(id=coicop, by=time, w=values, settings=list(w.tol=0.1)), by="geo"]
item.weights[t2==TRUE,
        list("n"=uniqueN(coicop),           # same selection over time in a country
             "w"=sum(values, na.rm=TRUE)),  # weight sums should equal 1000
        by=c("geo","time")]
#>        geo   time     n       w
#>     <char> <char> <int>   <num>
#>  1:     EA   2015    93  999.97
#>  2:     EA   2016    93  999.97
#>  3:     EA   2017    93 1000.01
#>  4:     EA   2018    93 1000.00
#>  5:     EA   2019    93 1000.00
#>  6:     EA   2020    93  999.99
#>  7:     EA   2021    93  999.99
#>  8:     EA   2022    93 1000.02
#>  9:     EA   2023    93 1000.04
#> 10:     EA   2024    93 1000.02
#> 11:     DE   2015   295 1000.00
#> 12:     DE   2016   295 1000.00
#> 13:     DE   2017   295 1000.00
#> 14:     DE   2018   295 1000.00
#> 15:     DE   2019   295 1000.00
#> 16:     DE   2020   295 1000.00
#> 17:     DE   2021   295 1000.00
#> 18:     DE   2022   295 1000.00
#> 19:     DE   2023   295 1000.00
#> 20:     DE   2024   295 1000.00
#> 21:     FR   2015   284  999.94
#> 22:     FR   2016   284  999.92
#> 23:     FR   2017   284 1000.02
#> 24:     FR   2018   284  999.99
#> 25:     FR   2019   284 1000.03
#> 26:     FR   2020   284  999.98
#> 27:     FR   2021   284  999.94
#> 28:     FR   2022   284 1000.02
#> 29:     FR   2023   284 1000.04
#> 30:     FR   2024   284  999.95
#>        geo   time     n       w

# derive merged trees over countries and time:
item.weights[, "t3" := tree(id=coicop, by=paste(geo,time), w=values, settings=list(w.tol=0.1))]
item.weights[t3==TRUE,
        list("n"=uniqueN(coicop),           # same selection over time and across countries
             "w"=sum(values, na.rm=TRUE)),  # weight sums should equal 1000
        by=c("geo","time")]
#>        geo   time     n       w
#>     <char> <char> <int>   <num>
#>  1:     DE   2015    93 1000.00
#>  2:     DE   2016    93 1000.00
#>  3:     DE   2017    93 1000.00
#>  4:     DE   2018    93 1000.00
#>  5:     DE   2019    93 1000.00
#>  6:     DE   2020    93 1000.00
#>  7:     DE   2021    93 1000.00
#>  8:     DE   2022    93 1000.00
#>  9:     DE   2023    93 1000.00
#> 10:     DE   2024    93 1000.00
#> 11:     EA   2015    93  999.97
#> 12:     EA   2016    93  999.97
#> 13:     EA   2017    93 1000.01
#> 14:     EA   2018    93 1000.00
#> 15:     EA   2019    93 1000.00
#> 16:     EA   2020    93  999.99
#> 17:     EA   2021    93  999.99
#> 18:     EA   2022    93 1000.02
#> 19:     EA   2023    93 1000.04
#> 20:     EA   2024    93 1000.02
#> 21:     FR   2015    93  999.99
#> 22:     FR   2016    93  999.99
#> 23:     FR   2017    93 1000.04
#> 24:     FR   2018    93 1000.00
#> 25:     FR   2019    93 1000.00
#> 26:     FR   2020    93  999.99
#> 27:     FR   2021    93 1000.02
#> 28:     FR   2022    93 1000.03
#> 29:     FR   2023    93  999.98
#> 30:     FR   2024    93  999.96
#>        geo   time     n       w

All three COICOP trees in the example above can be used to aggregate the all-items HICP in a single aggregation step as the item weights add up to 1000, respectively. While the selection of COICOP codes varies over time and across countries for t1, it is the same over time and across countries for t3.

Index aggregation, rates of change, and contributions

The HICP is a chain-linked Laspeyres-type index (European Union 2016). The (unchained) price indices in each calendar year refer to December of the previous year, which is the price reference period. These price indices are chain-linked to the existing index using December to obtain the HICP. The HICP indices currently refer to the index reference period 2015=100. Monthly and annual change rates can be derived from the price indices. The contributions of the price changes of individual items to the annual rate of change can be computed by the “Ribe contributions”. More details can be found in European Commission (2024, chap. 8).

Index aggregation

The all-items index is a weighted average of the items’ subindices. However, because the HICP is a chain index, the subindices can not simply be aggregated. They first need to be unchained, i.e., expressed relative to December of the previous year. These unchained indices can then be aggregated as a weighted average. Since the Laspeyres-type index is consistent in aggregation, the aggregation can be done stepwise from the bottom level to the top or directly in one step.

In the following example, the euro area HICP is computed directly in one step and also stepwise through all higher-level indices. For that purpose, the monthly price indices and item weights are first downloaded from Eurostat’s database. The two datasets are then merged. Second, the price indices are unchained using the function unchain(). Based on the derived ECOICOP tree, the unchained price indices are aggregated in one step using the Laspeyres-type index, chain()ed, and finally rebase()d to the index reference period 2015. A comparison to the published all-items index values shows only small differences due to rounding (since the published index numbers in Eurostat’s database are rounded and not available with all decimals).

# import monthly price indices:
prc <- hicp.dataimport(id="prc_hicp_midx", 
                       filter=list(unit="I15", geo="EA"),
                       date.range=c("2014-12", NA))
prc[, "time":=as.Date(paste0(time, "-01"))]
prc[, "year":=as.integer(format(time, "%Y"))]
prc[, "coicop" := gsub(pattern="^CP", replacement="", x=coicop)]
setnames(x=prc, old="values", new="index")

# unchain price indices:
prc[, "dec_ratio" := unchain(x=index, t=time), by="coicop"]

# import item weights:
inw <- item.weights[geo=="EA", list(coicop,geo,time,values)]
inw[, "time":=as.integer(time)]
setnames(x=inw, old=c("time","values"), new=c("year","weight"))

# derive coicop tree:
inw[ , "tree":=tree(id=coicop, w=weight, settings=list(w.tol=0.1)), by=c("geo","year")]

# merge price indices and item weights:
hicp.data <- merge(x=prc, y=inw, by=c("geo","coicop","year"), all.x=TRUE)
hicp.data <- hicp.data[year <= year(Sys.Date())-1,]

# compute all-items HICP in one aggregation step:
hicp.own <- hicp.data[tree==TRUE, 
                      list("laspey"=laspeyres(x=dec_ratio, w0=weight)), 
                      by="time"]
setorderv(x=hicp.own, cols="time")
hicp.own[, "chain_laspey" := chain(x=laspey, t=time, by=12)]
hicp.own[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015")]

# add published all-items HICP for comparison:
hicp.own <- merge(x=hicp.own,
                  y=hicp.data[coicop=="00", list(time, index)],
                  by="time",
                  all.x=TRUE)
plot(index-chain_laspey_15~time, 
     data=hicp.own, type="l", 
     xlab="Time", ylab="Difference (in index points)")
title("Difference between published index and own calculations")
abline(h=0, lty="dashed")

Similarly, the (unchained) price indices are also aggregate()d stepwise, which produces in addition to the all-items index all higher-level subindices. A comparison to the all-items index that has been computed in one step shows no differences. This highlights the consistency in aggregation of the indices. User-defined functions can be passed to aggregate() as well, which allows aggregation using any weighted or unweighted index formula.

# compute all-items HICP stepwise through all higher-levels:
hicp.own.all <- hicp.data[is.coicop(coicop), 
                          aggregate(x=dec_ratio, w0=weight, grp=coicop, index=laspeyres),
                          by="time"]
setorderv(x=hicp.own.all, cols="time")
hicp.own.all[, "chain_laspey" := chain(x=laspeyres, t=time, by=12), by="grp"]
hicp.own.all[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015"), by="grp"]

# compare all-items HICP from direct and step-wise aggregation:
agg.comp <- merge(x=hicp.own.all[grp=="00", list(time, "index_stpwse"=chain_laspey_15)],
                  y=hicp.own[, list(time, "index_direct"=chain_laspey_15)],
                  by="time")

# no differences -> consistent in aggregation:
nrow(agg.comp[abs(index_stpwse-index_direct)>1e-4,])
#> [1] 0

Rates of change and contributions

The indices show the price change between a comparison period and the index reference period. However, data users are more often interested in monthly and annual rates of change. Monthly change rates are computed by dividing the index in the current period by the index one month before, while annual change rates are derived by comparing the index in the current month to the index in the same month one year before. Both can be easily derived using function rates(). Contributions of the price changes of individual items to the annual rate of change can be computed by the Ribe or Kirchner contributions as implemented in function contrib().

# compute annual rates of change for the all-items HICP:
hicp.data[, "ar" := rates(x=index, t=time, type="annual"), by=c("geo","coicop")]

# add all-items hicp:
hicp.data <- merge(x=hicp.data,
                   y=hicp.data[coicop=="00", list(geo,time,index,weight)],
                   by=c("geo","time"), all.x=TRUE, suffixes=c("","_all"))

# ribe decomposition:
hicp.data[, "ribe" := contrib(x=index, w=weight, t=time, x.all=index_all, w.all=weight_all), by="coicop"]

# annual change rates over time:
plot(ar~time, data=hicp.data[coicop=="00",],
     type="l", xlab="Time", ylab="", ylim=c(-2,12))
lines(ribe~time, data=hicp.data[coicop=="01"], col="red")
title("Contributions of food to overall inflation")
legend("topleft", col=c("black","red"), lty=1, bty="n", 
       legend=c("Overall inflation (in %)", "Contributions of food (in pp-points)"))

References

European Commission, Eurostat. 2024. Harmonised Index of Consumer Prices (HICP) - Methodological Manual - 2024 edition. Luxembourg: Publications Office of the European Union. https://doi.org/10.2785/055028.
European Union. 2016. Regulation (EU) 2016/792 of 11 May 2016 on harmonised indices of consumer prices and the house price index.” Official Journal of the European Union 135: 12–38.