dataSDA

Datasets and Basic Statistics for Symbolic Data Analysis

R License: GPL v2 Version

Overview

dataSDA collects a diverse range of symbolic data and offers a comprehensive set of functions that facilitate the conversion of traditional data into the symbolic data format. It supports reading, writing, and conversion of symbolic data in diverse formats, as well as computing descriptive statistics of symbolic variables.

Installation

From GitHub

# install.packages("devtools")
devtools::install_github("hanmingwu1103/dataSDA")

From source

Download the latest release from the Releases page, then:

install.packages("dataSDA_0.1.5.tar.gz", repos = NULL, type = "source")

Features

Descriptive Statistics

Interval-valued data (int_*)

Compute mean, variance, covariance, and correlation for interval-valued data with 8 methods: CM, VM, QM, SE, FV, EJD, GQ, SPT.

library(dataSDA)
data(mushroom.int)

int_mean(mushroom.int, var_name = "Pileus.Cap.Width")
int_var(mushroom.int, var_name = c("Stipe.Length", "Stipe.Thickness"), method = c("CM", "FV", "EJD"))

int_cov(mushroom.int, var_name1 = "Pileus.Cap.Width",
        var_name2 = c("Stipe.Length", "Stipe.Thickness"),
        method = c("CM", "VM", "EJD", "GQ", "SPT"))
int_cor(mushroom.int, var_name1 = "Pileus.Cap.Width",
        var_name2 = "Stipe.Length", method = "CM")

Histogram-valued data (hist_*)

Compute mean, variance, covariance, and correlation for histogram-valued data with methods BG and L2W (cov/cor also support BD, B).

library(HistDAWass)

hist_mean(HistDAWass::BLOOD, var_name = "Cholesterol", method = "BG")
hist_var(HistDAWass::BLOOD, var_name = "Cholesterol", method = "L2W")

hist_cov(HistDAWass::BLOOD, var_name1 = "Cholesterol",
         var_name2 = "Hemoglobin", method = "BD")
hist_cor(HistDAWass::BLOOD, var_name1 = "Cholesterol",
         var_name2 = "Hemoglobin", method = "BG")

Data Format Conversion

Function Description
RSDA_to_MM Convert RSDA / symbolic_tbl to MM (min-max) format
MM_to_iGAP Convert MM format to iGAP format
iGAP_to_MM Convert iGAP format to MM format
RSDA_to_iGAP Convert RSDA format to iGAP format
SODAS_to_MM Convert SODAS format to MM format
SODAS_to_iGAP Convert SODAS format to iGAP format
RSDA_format Convert conventional data to RSDA format
set_variable_format One-hot encode set variables for RSDA format

Utilities

Function Description
clean_colnames Clean column names of a data frame
write_csv_table Write data to CSV file

Datasets

The package includes 32 built-in datasets for symbolic data analysis:

Interval-valued datasets (symbolic_tbl class)

Abalone, Cars.int, ChinaTemp.int, age_cholesterol_weight.int, baseball.int, bird.int, blood_pressure.int, car.int, finance.int, hierarchy.int, horses.int, lackinfo.int, LoansbyPurpose.int, mushroom.int, nycflights.int, ohtemp.int, profession.int, soccer.bivar.int, veterinary.int

iGAP / data.frame datasets

Abalone.iGAP, Face.iGAP, airline_flights, airline_flights2, crime, crime2, fuel_consumption, health_insurance, health_insurance2, hierarchy, mushroom, occupations, occupations2

Dependencies

Authors

License

GPL (>= 2)