hicp
-package
The Harmonised Index of Consumer Prices (HICP) is the key
economic figure to measure inflation in the euro area. The methodology
underlying the HICP is documented in the HICP Methodological Manual
(European Commission 2024). Based on this
manual, the hicp
-package provides functions for data users
to work with publicly available HICP price indices and weights
(upper-level aggregation). The following vignette highlights
the main package features. It contains three sections on data access,
the Classification of Consumption by Individual Purpose (COICOP)
underlying the HICP, and index aggregation.
# load package:
library(hicp)
# load additional packages:
library(data.table)
# set global options:
options(hicp.coicop.version="ecoicop-hicp") # the coicop version to be used
options(hicp.unbundle=TRUE) # treatment of coicop bundle codes like 08X
options(hicp.all.items.code="00") # internal code for the all-items index
The hicp
-package offers easy access to HICP data from
Eurostat’s public database. For
that purpose, it uses the download functionality provided by Eurostat’s
restatapi
-package.
This section shows how to list, filter and retrieve HICP data using the
functions hicp.datasets()
, hicp.datafilters()
,
and hicp.dataimport()
.
Eurostat’s database contains various datasets of different
statistics. All datasets are classified by topic and can be accessed via
a navigation tree. HICP data can be found under “Economy and finance /
Prices”. An even simpler solution that does not require visiting
Eurostat’s database is provided by the function
hicp.datasets()
, which lists all available HICP datasets
with corresponding metadata (e.g., number of observations, last
update).
dtd <- hicp.datasets()
dtd[1:5, list(title, code, lastUpdate, values)]
#> title
#> <char>
#> 1: HICP at constant tax rates - monthly data (index)
#> 2: HICP at constant tax rates - monthly data (annual rate of change)
#> 3: HICP at constant tax rates - monthly data (monthly rate of change)
#> 4: HICP - administered prices (composition)
#> 5: HICP - country weights
#> code lastUpdate values
#> <char> <char> <num>
#> 1: prc_hicp_cind 2024.07.17 2111411
#> 2: prc_hicp_cann 2024.07.17 1934261
#> 3: prc_hicp_cmon 2024.07.17 4193574
#> 4: prc_hicp_apc 2024.07.17 713367
#> 5: prc_hicp_cow 2024.02.22 2921
The output above shows the first five HICP datasets. As can be seen,
a short description of each dataset and some metadata are provided. The
variable code
is the dataset identifier, which is needed to
filter and download data.
The HICP is compiled each month in each member state of the European
Union (EU) for various items. Its compilation started in 1996.
Therefore, the dataset of price indices is relatively large. Sometimes,
however, data users only need the price indices of certain years or
specific countries. Eurostat’s API and, thus, the
restatapi
-package allows to provide filters on each data
request, e.g., to download only the price indices of the euro area for
the all-items HICP. The filtering options can differ for each dataset.
Therefore, function hicp.datafilters()
returns the allowed
filtering options for a given dataset.
# dataset 'prc_hicp_inw':
dtf <- hicp.datafilters(id="prc_hicp_inw")
# allowed filters:
unique(dtf$concept)
#> [1] "freq" "coicop" "geo"
# allowed filter values:
dtf[1:5,]
#> concept code name
#> <char> <char> <char>
#> 1: freq A Annual
#> 2: coicop CP00 All-items HICP
#> 3: coicop CP01 Food and non-alcoholic beverages
#> 4: coicop CP011 Food
#> 5: coicop CP0111 Bread and cereals
The output above shows that the dataset prc_hicp_inw
on
item weights can be filtered according to freq
,
coicop
, and geo
. The table dtf
contains for each filter the allowed values, e.g., CP011
for coicop
and A
for freq
. These
filters can be integrated in the data download as explained in the
following subsection.
Applying a filter to a data request can noticeably reduce the
downloading time, particularly for bigger datasets. Function
hicp.dataimport()
can be used to download a specific
dataset.
# download item weights for euro area from 2015 on:
item.weights <- hicp.dataimport(id="prc_hicp_inw", filters=list("geo"=c("EA","DE","FR")), date.range=c("2015", NA), flags=TRUE)
# inspect data:
item.weights[1:5, ]
#> Key: <coicop, geo, time>
#> coicop geo time values flags
#> <char> <char> <char> <num> <char>
#> 1: AP DE 2015 141.49 b
#> 2: AP DE 2016 146.47 <NA>
#> 3: AP DE 2017 141.30 <NA>
#> 4: AP DE 2018 139.96 <NA>
#> 5: AP DE 2019 141.78 <NA>
nrow(item.weights) # number of observations
#> [1] 13412
unique(item.weights$geo) # only EA, DE, and FR
#> [1] "DE" "EA" "FR"
range(item.weights$time) # since 2015
#> [1] "2015" "2024"
The object dt
contains the item weights for the euro
area since 2015. If one would have wanted the whole dataset, the request
would simplify to hicp.dataimport(id="prc_hicp_inw")
.
HICP item weights and price indices are classified according to the
European COICOP (ECOICOP-HICP). This COICOP version is used by default
(options(hicp.coicop.version="ecoicop-hicp")
) but others
are available in the package as well. The all-items HICP includes twelve
item divisions, which are further broken down by consumption purpose. At
the lowest level of subclasses (5-digit codes), there is the finest
differentiation of items for which weights are available, e.g.,
rice (01111) or bread (01113). Both rice and bread
belong to the same class, bread and cereals (0111), and, at
higher levels, to the same group food (011) and division
food and non-alcoholic beverages (01). Hence, ECOICOP and thus
also the HICP follows a pre-defined hierarchical tree, where the item
weights of the all-items HICP add up to 1000. This section shows how to
work with the COICOP codes to derive for example the lowest level of
items that form the all-items HICP.
COICOP codes and bundles. The COICOP codes
underlying the HICP (ECOICOP)
consist of numbers. The code 00
is used in this package for
the all-items HICP although it is no official COICOP code (see
options(hicp.all.items.code="00")
). The codes of the twelve
divisions below start with 01, 02,..., 12
. At the lowest
level of subclasses, the codes consist of 5 digits. Using the function
is.coicop()
, it can be easily checked if a code is a valid
COICOP code or not. This includes bundle codes like
082_083
, which violate the standard COICOP code pattern,
but can be found in HICP data. Bundle codes can be generally detected
using is.bundle()
and be unbundled using function
unbundle()
.
# example codes:
ids <- c("00","CP00","13","08X")
# check for bundle codes:
is.bundle(id=ids)
#> [1] FALSE FALSE FALSE TRUE
# unbundle any bundle codes into their components:
unbundle(id=ids)
#> 00 CP00 13 08X 08X
#> "00" "CP00" "13" "082" "083"
# check if valid ECOICOP code including bundle codes:
is.coicop(id=ids, settings=list(unbundle=TRUE))
#> [1] FALSE FALSE FALSE TRUE
# check if valid ECOICOP code excluding bundle codes:
is.coicop(id=ids, settings=list(unbundle=FALSE))
#> [1] FALSE FALSE FALSE FALSE
# games of chance have a valid ECOICOP code:
is.coicop("0943", settings=list(coicop.version="ecoicop"))
#> [1] TRUE
# but not in the ECOICOP-HICP:
is.coicop("0943", settings=list(coicop.version="ecoicop-hicp"))
#> [1] FALSE
COICOP relatives. COICOP codes available in the data
downloaded from Eurostat’s database should be generally valid (except
for the prefix “CP”). More relevant is thus the detection of children
and parent codes in the data. Children are those codes that belong to
the same higher-level code (or parent). Such relations can be direct
(e.g., 01->011
) or indirect (e.g.,
01->0111
). It is important to note that children exhibit
exactly one parent, while a parent may contain multiple children. This
can be seen in the example below.
# example codes:
ids <- c("00","01","011","01111","01112")
# no direct parent for 01111 and 01112:
parent(id=ids, flag=FALSE, direct=TRUE)
#> [1] NA "00" "01" NA NA
# indirect parent available:
parent(id=ids, flag=FALSE, direct=FALSE)
#> [1] NA "00" "01" "011" "011"
# 011 has two (indirect) childs:
child(id=ids, flag=FALSE, direct=FALSE)
#> [[1]]
#> [1] "01" "011" "01111" "01112"
#>
#> [[2]]
#> [1] "011" "01111" "01112"
#>
#> [[3]]
#> [1] "01111" "01112"
#>
#> [[4]]
#> character(0)
#>
#> [[5]]
#> character(0)
The functions child()
and parents()
may be
useful for various reasons. To derive the composition of COICOP codes at
the lowest possible level, however, the function tree()
is
better suited. For the HICP, the derivation of this composition can be
done separately for each reporting month and country. Consequently, the
selection of COICOP codes may differ across space and time. If needed,
however, specifying argument by
in function
tree()
allows to merge the composition of COICOP codes at
the lowest possible level, e.g., to obtain a unique selection of the
same COICOP codes over time. Because the derivation of COICOP codes
searches in the whole COICOP tree, the resulting composition of COICOP
codes is also denoted as the COICOP tree in this package.
# subset and adjust item weights table:
item.weights <- item.weights[grepl("^CP", coicop),]
item.weights[, "coicop":=gsub(pattern="^CP", replacement="", x=coicop)]
# derive separate trees for each time period and country:
item.weights[, "t1" := tree(id=coicop, w=values, settings=list(w.tol=0.1)), by=c("geo","time")]
item.weights[t1==TRUE,
list("n"=uniqueN(coicop), # varying coicops over time and space
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")]
#> geo time n w
#> <char> <char> <int> <num>
#> 1: EA 2015 93 999.97
#> 2: EA 2016 93 999.97
#> 3: DE 2015 295 1000.00
#> 4: DE 2016 295 1000.00
#> 5: DE 2017 295 1000.00
#> 6: DE 2018 295 1000.00
#> 7: DE 2019 295 1000.00
#> 8: DE 2020 295 1000.00
#> 9: DE 2021 295 1000.00
#> 10: DE 2022 295 1000.00
#> 11: DE 2023 295 1000.00
#> 12: DE 2024 295 1000.00
#> 13: EA 2017 295 1000.11
#> 14: EA 2018 295 1000.05
#> 15: EA 2019 295 1000.00
#> 16: EA 2020 295 999.95
#> 17: EA 2021 295 999.91
#> 18: EA 2022 295 1000.08
#> 19: EA 2023 295 999.98
#> 20: EA 2024 295 999.98
#> 21: FR 2015 284 999.94
#> 22: FR 2016 295 999.91
#> 23: FR 2017 295 1000.01
#> 24: FR 2018 295 1000.00
#> 25: FR 2019 295 1000.04
#> 26: FR 2020 295 999.99
#> 27: FR 2021 295 999.94
#> 28: FR 2022 295 1000.01
#> 29: FR 2023 295 1000.04
#> 30: FR 2024 295 999.95
#> geo time n w
# derive merged trees over time, but not across countries:
item.weights[, "t2" := tree(id=coicop, by=time, w=values, settings=list(w.tol=0.1)), by="geo"]
item.weights[t2==TRUE,
list("n"=uniqueN(coicop), # same selection over time in a country
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")]
#> geo time n w
#> <char> <char> <int> <num>
#> 1: EA 2015 93 999.97
#> 2: EA 2016 93 999.97
#> 3: EA 2017 93 1000.01
#> 4: EA 2018 93 1000.00
#> 5: EA 2019 93 1000.00
#> 6: EA 2020 93 999.99
#> 7: EA 2021 93 999.99
#> 8: EA 2022 93 1000.02
#> 9: EA 2023 93 1000.04
#> 10: EA 2024 93 1000.02
#> 11: DE 2015 295 1000.00
#> 12: DE 2016 295 1000.00
#> 13: DE 2017 295 1000.00
#> 14: DE 2018 295 1000.00
#> 15: DE 2019 295 1000.00
#> 16: DE 2020 295 1000.00
#> 17: DE 2021 295 1000.00
#> 18: DE 2022 295 1000.00
#> 19: DE 2023 295 1000.00
#> 20: DE 2024 295 1000.00
#> 21: FR 2015 284 999.94
#> 22: FR 2016 284 999.92
#> 23: FR 2017 284 1000.02
#> 24: FR 2018 284 999.99
#> 25: FR 2019 284 1000.03
#> 26: FR 2020 284 999.98
#> 27: FR 2021 284 999.94
#> 28: FR 2022 284 1000.02
#> 29: FR 2023 284 1000.04
#> 30: FR 2024 284 999.95
#> geo time n w
# derive merged trees over countries and time:
item.weights[, "t3" := tree(id=coicop, by=paste(geo,time), w=values, settings=list(w.tol=0.1))]
item.weights[t3==TRUE,
list("n"=uniqueN(coicop), # same selection over time and across countries
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")]
#> geo time n w
#> <char> <char> <int> <num>
#> 1: DE 2015 93 1000.00
#> 2: DE 2016 93 1000.00
#> 3: DE 2017 93 1000.00
#> 4: DE 2018 93 1000.00
#> 5: DE 2019 93 1000.00
#> 6: DE 2020 93 1000.00
#> 7: DE 2021 93 1000.00
#> 8: DE 2022 93 1000.00
#> 9: DE 2023 93 1000.00
#> 10: DE 2024 93 1000.00
#> 11: EA 2015 93 999.97
#> 12: EA 2016 93 999.97
#> 13: EA 2017 93 1000.01
#> 14: EA 2018 93 1000.00
#> 15: EA 2019 93 1000.00
#> 16: EA 2020 93 999.99
#> 17: EA 2021 93 999.99
#> 18: EA 2022 93 1000.02
#> 19: EA 2023 93 1000.04
#> 20: EA 2024 93 1000.02
#> 21: FR 2015 93 999.99
#> 22: FR 2016 93 999.99
#> 23: FR 2017 93 1000.04
#> 24: FR 2018 93 1000.00
#> 25: FR 2019 93 1000.00
#> 26: FR 2020 93 999.99
#> 27: FR 2021 93 1000.02
#> 28: FR 2022 93 1000.03
#> 29: FR 2023 93 999.98
#> 30: FR 2024 93 999.96
#> geo time n w
All three COICOP trees in the example above can be used to aggregate
the all-items HICP in a single aggregation step as the item weights add
up to 1000, respectively. While the selection of COICOP codes varies
over time and across countries for t1
, it is the same over
time and across countries for t3
.
The HICP is a chain-linked Laspeyres-type index (European Union 2016). The (unchained) price indices in each calendar year refer to December of the previous year, which is the price reference period. These price indices are chain-linked to the existing index using December to obtain the HICP. The HICP indices currently refer to the index reference period 2015=100. Monthly and annual change rates can be derived from the price indices. The contributions of the price changes of individual items to the annual rate of change can be computed by the “Ribe contributions”. More details can be found in European Commission (2024, chap. 8).
The all-items index is a weighted average of the items’ subindices. However, because the HICP is a chain index, the subindices can not simply be aggregated. They first need to be unchained, i.e., expressed relative to December of the previous year. These unchained indices can then be aggregated as a weighted average. Since the Laspeyres-type index is consistent in aggregation, the aggregation can be done stepwise from the bottom level to the top or directly in one step.
In the following example, the euro area HICP is computed directly in
one step and also stepwise through all higher-level indices. For that
purpose, the monthly price indices and item weights are first downloaded
from Eurostat’s database. The two datasets are then merged. Second, the
price indices are unchained using the function unchain()
.
Based on the derived ECOICOP tree, the unchained price indices are
aggregated in one step using the Laspeyres-type index,
chain()
ed, and finally rebase()
d to the index
reference period 2015. A comparison to the published all-items index
values shows only small differences due to rounding (since the published
index numbers in Eurostat’s database are rounded and not available with
all decimals).
# import monthly price indices:
prc <- hicp.dataimport(id="prc_hicp_midx",
filter=list(unit="I15", geo="EA"),
date.range=c("2014-12", NA))
prc[, "time":=as.Date(paste0(time, "-01"))]
prc[, "year":=as.integer(format(time, "%Y"))]
prc[, "coicop" := gsub(pattern="^CP", replacement="", x=coicop)]
setnames(x=prc, old="values", new="index")
# unchain price indices:
prc[, "dec_ratio" := unchain(x=index, t=time), by="coicop"]
# import item weights:
inw <- item.weights[geo=="EA", list(coicop,geo,time,values)]
inw[, "time":=as.integer(time)]
setnames(x=inw, old=c("time","values"), new=c("year","weight"))
# derive coicop tree:
inw[ , "tree":=tree(id=coicop, w=weight, settings=list(w.tol=0.1)), by=c("geo","year")]
# merge price indices and item weights:
hicp.data <- merge(x=prc, y=inw, by=c("geo","coicop","year"), all.x=TRUE)
hicp.data <- hicp.data[year <= year(Sys.Date())-1,]
# compute all-items HICP in one aggregation step:
hicp.own <- hicp.data[tree==TRUE,
list("laspey"=laspeyres(x=dec_ratio, w0=weight)),
by="time"]
setorderv(x=hicp.own, cols="time")
hicp.own[, "chain_laspey" := chain(x=laspey, t=time, by=12)]
hicp.own[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015")]
# add published all-items HICP for comparison:
hicp.own <- merge(x=hicp.own,
y=hicp.data[coicop=="00", list(time, index)],
by="time",
all.x=TRUE)
plot(index-chain_laspey_15~time,
data=hicp.own, type="l",
xlab="Time", ylab="Difference (in index points)")
title("Difference between published index and own calculations")
abline(h=0, lty="dashed")
Similarly, the (unchained) price indices are also
aggregate()
d stepwise, which produces in addition to the
all-items index all higher-level subindices. A comparison to the
all-items index that has been computed in one step shows no differences.
This highlights the consistency in aggregation of the indices.
User-defined functions can be passed to aggregate()
as
well, which allows aggregation using any weighted or unweighted index
formula.
# compute all-items HICP stepwise through all higher-levels:
hicp.own.all <- hicp.data[is.coicop(coicop),
aggregate(x=dec_ratio, w0=weight, grp=coicop, index=laspeyres),
by="time"]
setorderv(x=hicp.own.all, cols="time")
hicp.own.all[, "chain_laspey" := chain(x=laspeyres, t=time, by=12), by="grp"]
hicp.own.all[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015"), by="grp"]
# compare all-items HICP from direct and step-wise aggregation:
agg.comp <- merge(x=hicp.own.all[grp=="00", list(time, "index_stpwse"=chain_laspey_15)],
y=hicp.own[, list(time, "index_direct"=chain_laspey_15)],
by="time")
# no differences -> consistent in aggregation:
nrow(agg.comp[abs(index_stpwse-index_direct)>1e-4,])
#> [1] 0
The indices show the price change between a comparison period and the
index reference period. However, data users are more often interested in
monthly and annual rates of change. Monthly change rates are computed by
dividing the index in the current period by the index one month before,
while annual change rates are derived by comparing the index in the
current month to the index in the same month one year before. Both can
be easily derived using function rates()
. Contributions of
the price changes of individual items to the annual rate of change can
be computed by the Ribe or Kirchner contributions as implemented in
function contrib()
.
# compute annual rates of change for the all-items HICP:
hicp.data[, "ar" := rates(x=index, t=time, type="annual"), by=c("geo","coicop")]
# add all-items hicp:
hicp.data <- merge(x=hicp.data,
y=hicp.data[coicop=="00", list(geo,time,index,weight)],
by=c("geo","time"), all.x=TRUE, suffixes=c("","_all"))
# ribe decomposition:
hicp.data[, "ribe" := contrib(x=index, w=weight, t=time, x.all=index_all, w.all=weight_all), by="coicop"]
# annual change rates over time:
plot(ar~time, data=hicp.data[coicop=="00",],
type="l", xlab="Time", ylab="", ylim=c(-2,12))
lines(ribe~time, data=hicp.data[coicop=="01"], col="red")
title("Contributions of food to overall inflation")
legend("topleft", col=c("black","red"), lty=1, bty="n",
legend=c("Overall inflation (in %)", "Contributions of food (in pp-points)"))