Introduction

rconf is a minimal and lightweight configuration tool for R that parses basic YAML configuration files without any external dependencies. This vignette provides a detailed guide on using rconf for various scenarios—from basic configuration loading to advanced techniques such as dynamic configuration selection and merging configurations.

Basic Usage

The simplest way to use rconf is to load a configuration file and extract settings. For example, assume you have a configuration file stored in the package’s inst/extdata directory. You can load it as follows:

library(rconf)

# Load the default configuration from the sample file in extdata.
cfg <- get_config(system.file("extdata", "config.yml", package = "rconf"))
print(cfg)

## $raw_data_dir
## [1] "/data/proteomics/raw"
## 
## $processed_data_dir
## [1] "/data/proteomics/processed"
## 
## $sample_metadata
## [1] "/data/proteomics/metadata/samples.csv"
## 
## $normalization_method
## [1] "median"
## 
## $quantification
## [1] "LFQ"
## 
## $protein_fdr
## [1] 0.01
## 
## $differential_expression
## [1] TRUE
## 
## $p_value_cutoff
## [1] 0.05
## 
## $plot
## $plot$output_dir
## [1] "results/plots"
## 
## $plot$format
## [1] "png"
## 
## $plot$dpi
## [1] 300

The returned list cfg will contain keys such as raw_data_dir, sample_metadata, and analysis parameters defined in your YAML file.

Working with Multiple Configurations

Your YAML file can contain multiple configuration sets (e.g., default, development, production). For example, consider the following YAML snippet:


default:
  raw_data_dir: "/data/proteomics/raw"
  processed_data_dir: "/data/proteomics/processed"
  sample_metadata: "/data/proteomics/metadata/samples.csv"
  normalization_method: "median"

development:
  raw_data_dir: "/data/dev/proteomics/raw"
  processed_data_dir: "/data/dev/proteomics/processed"
  sample_metadata: "/data/dev/proteomics/metadata/samples_dev.csv"
  normalization_method: "quantile"

You can load a specific configuration by passing the desired configuration name:

# Load the 'development' configuration
dev_cfg <- get_config(system.file("extdata", "config.yml", package = "rconf"), 
                      config_name = "development")
print(dev_cfg$raw_data_dir)

## NULL

Advanced Parsing Features

The rconf parser supports the following:

Inline Comments: Any text following a space and a hash ( #) is removed from the value. For example:

file_type: bam       # options: bam or bed

yields a value of "bam".

Inline Arrays: Arrays specified inline are flattened into atomic vectors. For example:

state_numbers: [25, 30, 35]

is parsed into the numeric vector c(25, 30, 35).

Nested Keys: Indentation is used to create nested lists. For example:

plot:
  output_dir: "results/plots"
  format: "png"

results in a list plot with elements: output_dir and format.

Overriding and Merging Configurations

Because rconf returns a list, you can easily override configuration values at runtime. For instance:

# Load default configuration
cfg <- get_config(system.file("extdata", "config.yml", package = "rconf"))

# Override a parameter
cfg$normalization_method <- "z-score"
print(cfg$normalization_method)

## [1] "z-score"

Merging Configurations

You can also merge multiple configurations. For example, if you have a development configuration and want to override specific parameters with values from the default configuration, you can do so as follows:

base_cfg <- get_config(system.file("extdata", "config.yml", package = "rconf"), config_name = "default")
dev_cfg <- get_config(system.file("extdata", "config.yml", package = "rconf"), config_name = "development")
combined_cfg <- merge_configs(base_cfg, dev_cfg)
print(combined_cfg)

## $raw_data_dir
## [1] "/data/proteomics/raw"
## 
## $processed_data_dir
## [1] "/data/proteomics/processed"
## 
## $sample_metadata
## [1] "/data/proteomics/metadata/samples.csv"
## 
## $normalization_method
## [1] "median"
## 
## $quantification
## [1] "LFQ"
## 
## $protein_fdr
## [1] 0.01
## 
## $differential_expression
## [1] TRUE
## 
## $p_value_cutoff
## [1] 0.05
## 
## $plot
## $plot$output_dir
## [1] "results/plots"
## 
## $plot$format
## [1] "png"
## 
## $plot$dpi
## [1] 300
## 
## 
## $default
## $default$raw_data_dir
## [1] "/data/proteomics/raw"
## 
## $default$processed_data_dir
## [1] "/data/proteomics/processed"
## 
## $default$sample_metadata
## [1] "/data/proteomics/metadata/samples.csv"
## 
## $default$normalization_method
## [1] "median"
## 
## $default$quantification
## [1] "LFQ"
## 
## $default$protein_fdr
## [1] 0.01
## 
## $default$differential_expression
## [1] TRUE
## 
## $default$p_value_cutoff
## [1] 0.05
## 
## $default$plot
## $default$plot$output_dir
## [1] "results/plots"
## 
## $default$plot$format
## [1] "png"
## 
## $default$plot$dpi
## [1] 300

Dynamic Configuration Selection

If your project needs to choose configurations dynamically (e.g., based on an environment variable), you can create a helper function:

select_config <- function() {
  env <- Sys.getenv("APP_ENV", unset = "default")
  cfg <- get_config(system.file("extdata", "config.yml", package = "rconf"), config_name = env)
  cfg
}

# Example usage:
Sys.setenv(APP_ENV = "development")
current_cfg <- select_config()
print(current_cfg)

## $default
## $default$raw_data_dir
## [1] "/data/proteomics/raw"
## 
## $default$processed_data_dir
## [1] "/data/proteomics/processed"
## 
## $default$sample_metadata
## [1] "/data/proteomics/metadata/samples.csv"
## 
## $default$normalization_method
## [1] "median"
## 
## $default$quantification
## [1] "LFQ"
## 
## $default$protein_fdr
## [1] 0.01
## 
## $default$differential_expression
## [1] TRUE
## 
## $default$p_value_cutoff
## [1] 0.05
## 
## $default$plot
## $default$plot$output_dir
## [1] "results/plots"
## 
## $default$plot$format
## [1] "png"
## 
## $default$plot$dpi
## [1] 300

Troubleshooting

Empty or Missing Values: Ensure that your YAML file does not contain only comments or blank lines; otherwise, rconf will return an empty list.
Parsing Errors: Double-check your YAML syntax (e.g., colon-separated key-value pairs, consistent indentation). The parser assumes a 2-space indentation for nested keys.
Overriding Behavior: When merging configurations, note that nested lists are replaced rather than deeply merged. Use a custom merge function if you require deep merging.

Advanced Usage of rconf: A Detailed Guide

Yaoxiang Li

2017-12-09