defined()
is a vector subclass of labelled. Labelled
improves the semantic capacity of a base R factor with improved value
levels and labels by adding a long-form, human-readable label to the
variable itself.
gdp_1 = defined(
c(3897, 7365),
label = "Gross Domestic Product",
unit = "million dollars",
definition = "http://data.europa.eu/83i/aa/GDP")
The defined()
class extends the attributes of a labelled
vector with a unit (of measure), a definition and a namespace.
attributes(gdp_1)
#> $label
#> [1] "Gross Domestic Product"
#>
#> $class
#> [1] "haven_labelled_defined" "haven_labelled" "vctrs_vctr"
#> [4] "double"
#>
#> $unit
#> [1] "million dollars"
#>
#> $definition
#> [1] "http://data.europa.eu/83i/aa/GDP"
cat("Get the label only: ")
#> Get the label only:
var_label(gdp_1)
#> [1] "Gross Domestic Product"
cat("Get the unit only: ")
#> Get the unit only:
var_unit(gdp_1)
#> [1] "million dollars"
cat("Get the definition only: ")
#> Get the definition only:
var_definition(gdp_1)
#> [1] "http://data.europa.eu/83i/aa/GDP"
What happens if we try to concatenate a semantically under-specified new vector to the GDP vector?
You will get an intended error message that some attributes are not compatible. You certainly want to avoid that you are concatenating figures in euros and dollars, for example.
c(gdp_1, gdp_2)
Error in `vec_c()`:
! Can't combine `..1` <haven_labelled_defined> and `..2` <haven_labelled_defined>.
✖ Some attributes are incompatible.
Let’s define better the GDP of San Marino:
summary(c(gdp_1, gdp_2))
#> Gross Domestic Product (million dollars)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2034 2966 3897 4432 5631 7365
country = defined(c("AD", "LI", "SM"),
label = "Country name",
definition = "http://data.europa.eu/bna/c_6c2bb82d",
namespace = "https://www.geonames.org/countries/$1/")
The point of using a namespace is that it can point to a both human- and machine readable definition of the ID column, or any attribute column in the datasets. (Attributes in a statistical datasets are characteristics of the observations or the measured variables.)
For example, the namespace definition above points to https://www.geonames.org/countries/AD/ in the case of Andorra, https://www.geonames.org/countries/LI/ for Lichtenstein, and https://www.geonames.org/countries/SM/ for San Marino. And http://publications.europa.eu/resource/authority/bna/c_6c2bb82d resolves to a machine-readable definition of geographical names.