Attention: Some changes to functions in the current version of madshapR may require updates of existing code.
| previous version (1.1.0 and older) | version 2.0.0 | 
|---|---|
| Rmonize_DEMO | Rmonize_examples | 
In functions show_harmo_error(), harmonized_dossier_evaluate(), harmonized_dossier_summarize() and harmonized_dossier_visualize(), the parameters have been simplified into one and only “dossier” https://github.com/maelstrom-research/Rmonize/issues/110 https://github.com/maelstrom-research/Rmonize/issues/109 https://github.com/maelstrom-research/Rmonize/issues/108 https://github.com/maelstrom-research/Rmonize/issues/98 https://github.com/maelstrom-research/Rmonize/issues/93 https://github.com/maelstrom-research/Rmonize/issues/92
previous version (1.1.0 and older)
harmonized_dossier_evaluate(
  harmonized_dossier,dataschema,taxonomy,as_dataschema_mlstr)
harmonized_dossier_summarize(
  harmonized_dossier,group_by,dataschema,data_proc_elem,
  taxonomy,valueType_guess)
harmonized_dossier_visualize(
  harmonized_dossier,bookdown_path,group_by,harmonized_dossier_summary,
  dataschema,data_proc_elem,valueType_guess,taxonomy)version 2.0.0
harmonized_dossier_evaluate(harmonized_dossier)
harmonized_dossier_summarize(harmonized_dossier)
harmonized_dossier_visualize(harmonized_dossier,bookdown_path)In harmonized_dossier_evaluate(), the columns generated
in the outputs have been renamed as follows :
| previous version (1.1.0 and older) | current version (2.0.0) | 
|---|---|
| index | Index | 
| name | Variable name | 
| label | Variable label | 
| valueType | Data dictionary valueType | 
| Categories::label | Categories in data dictionary | 
| Categories::missing | Non-valid categories | 
In harmonized_dossier_summarize(), the columns generated
in the outputs have been renamed as follows :
| previous version (1.1.0 and older) | current version (2.0.0) | 
|---|---|
| index in data dict.name | Index | 
| name | Variable name | 
| label | Variable label | 
| Estimated dataset valueType | Suggested valueType | 
| Actual dataset valueType | Dataset valueType | 
| Total number of observations | Number of rows | 
| Nb. distinct values | Number of distinct values | 
| Nb. valid values | Number of valid values | 
| Nb. non-valid values | Number of non-valid values | 
| Nb. NA | Number of empty values | 
| % total Valid values | % Valid values | 
| % Non-valid values | % Non-valid values | 
| % NA | % Empty values | 
| ———————————— | ——————————— | 
The assessment and summary reports had some updates, such as renamed columns and bug corrections. https://github.com/maelstrom-research/Rmonize/issues/104 https://github.com/maelstrom-research/Rmonize/issues/103 https://github.com/maelstrom-research/Rmonize/issues/89 https://github.com/maelstrom-research/Rmonize/issues/88 https://github.com/maelstrom-research/Rmonize/issues/87 https://github.com/maelstrom-research/Rmonize/issues/86 https://github.com/maelstrom-research/Rmonize/issues/85 https://github.com/maelstrom-research/Rmonize/issues/84 https://github.com/maelstrom-research/Rmonize/issues/68 https://github.com/maelstrom-research/Rmonize/issues/21
The visual reports have been improved, including better visual outputs and color palettes, and new features such as total number of rows next to the bar charts.
https://github.com/maelstrom-research/Rmonize/issues/57 https://github.com/maelstrom-research/Rmonize/issues/53 https://github.com/maelstrom-research/Rmonize/issues/49 https://github.com/maelstrom-research/Rmonize/issues/48 https://github.com/maelstrom-research/Rmonize/issues/39 https://github.com/maelstrom-research/Rmonize/issues/37 https://github.com/maelstrom-research/Rmonize/issues/33 https://github.com/maelstrom-research/Rmonize/issues/32 https://github.com/maelstrom-research/Rmonize/issues/29
To process the data during testing, the DataSchema and/or the
Data Processing Elements and/or input datasets might not be available.
To be able to perform testings on harmonization, an additional parameter
.debug has been added https://github.com/maelstrom-research/Rmonize/issues/56
The report function can now work when the code is indented in the Data Processing Elements. https://github.com/maelstrom-research/Rmonize/issues/54
The function show_harmo_error() now allows the user
to avoid showing warnings https://github.com/maelstrom-research/Rmonize/issues/52
To avoid confusion with help(function), the function
Rmonize_help() has been renamed
Rmonize_website().
Bug corrections and enhancements after testing with real data.
The functions harmo_process(),
pool_harmonized_dataset_create(),
harmonized_dossier_create(),
harmonized_dossier_evaluate(),
harmonized_dossier_summarize(),
harmonized_dossier_visualize() share the same parameter
“harmonized_col_dataset” which is (if exists) the name of the column
referring the input dataset names. If this column exists and is declared
by the user, this will be used across the pipeline as a
grouping/separating variable. By default, the name of each dataset will
be used instead.
rename DEMO_file_harmo into Rmonize_DEMO and update examples
suppress the parameter overwrite = TRUE in the functions xxx_visualize()
in visual reports, void confusing changes in color scheme in visual reports.
Histograms for date variables display valid ranges.
in reports, change % NA as proportion in reports.
harmonized_dossier_visualize() report shows variable
labels in the same language.
put id_creation in script and in rule in dpe (as in direct_mapping)
Allow special characters in names of datasets and data_dicts
In visual reports, the bar plot only appears when there are multiple missing value types, otherwise only the pie chart is shown.
enhance harmonized_dossier_visualize() output
enhance show_harmo_error() output
in reports, all of the percentages are now included under “Other values (non categorical)”, which gives a single value.
Function recode with special character is possible now
Functions to support rigorous retrospective data harmonization processing, evaluation, and documentation across datasets in a dossier based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the harmonization process, apply specified processing rules to generate harmonized data, diagnose processing errors, and summarize and evaluate harmonized outputs.
This is still a work in progress, so please let us know if you used a function before and is not working any longer.
Rmonize_help() Call the help center for full
documentationdowload_templates() Call the help center to the
download template pageRmonize_DEMO Built-in material allowing the user to
test the package with demo dataas_data_proc_elem() Validate and coerce any object as a
Data Processing Elementsas_dataschema(), as_dataschema_mlstr()
Validate and coerce any object as the DataSchemaas_harmonized_dossier() Validate and coerce any object
as an harmonized dossierdataschema_extract() Extract and create the DataSchema
from a data processing elementsharmo_process() Generate harmonized dataset(s) and
annotated Data Processing Elements. This function internally runs other
functions, which are :
harmo_parse_process_rule(),
harmo_process_add_variable(),harmo_process_case_when(),
harmo_process_direct_mapping(),harmo_process_id_creation(),
harmo_process_impossible(),harmo_process_merge_variable(),
harmo_process_operation(),harmo_process_other(),
harmo_process_paste(),harmo_process_recode(),
harmo_process_rename(),harmo_process_undetermined()
pooled_harmonized_dataset_create() Generate the
pooled dataset from harmonized datasets in a dossier
show_harmo_error() Generate a summary of the annotated
Data Processing Elementsdata_proc_elem_evaluate(),dataschema_evaluate(),
harmonized_dossier_evaluate(),harmonized_dossier_summarize(),
harmonized_dossier_visualize() Generate a quality
assessment reports and summary statistics of inputs and outputs.as_data_dict(),is_data_dict(),
as_data_dict_mlstr(),is_data_dict_mlstr(),
as_dataset(),is_dataset(),
as_dossier(),is_dossier(),
as_taxonomy()
data_extract(),data_dict_extract(),
data_dict_apply(),dataset_zap_data_dict(),dossier_create()
valueType_adjust()
dataset_evaluate(),
data_dict_evaluate(),dossier_evaluate(),
dataset_summarize(),dossier_summarize()
bookdown_template(),bookdown_render(),bookdown_open(),
dataset_visualize()