Segment Profile Extraction via Pattern Analysis: A Workflow Guide

Se-Kang Kim

2026-03-23

Note. All code chunks in this vignette are set to eval = FALSE to keep CRAN check times within limits, as the bootstrap and permutation procedures are computationally intensive. All code is fully executable in an interactive R session. Precomputed results for all three pipelines are stored in inst/extdata/ and can be loaded with readRDS(system.file("extdata", "results_bin.rds", package = "SEPA")) etc. Full output and figures are reported in the accompanying manuscript (Kim and Grochowalski, 2019, doi:10.1007/s00357-018-9277-7).


1 Introduction

The SEPA package implements the Segment Profile Extraction via Pattern Analysis method for row-mean-centered multivariate data. The three automated workflow functions are:

All three pipelines share a common structure:

  1. Dimensionality assessment via parallel analysis
  2. Bootstrap Procrustes stability diagnostics using a simultaneous dual criterion (principal angles and Tucker congruence coefficients)
  3. Variance-weighted aggregation of stable dimensions into a person-level index
library("SEPA")

2 Example 1: Binary Data

This example illustrates the alsi_workflow() pipeline using binary diagnostic data from N = 1,261 individuals assessed for eating disorders.

2.1 Data

data("ANR2", package = "SEPA")
vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD")
head(ANR2[, vars])

Diagnostic prevalence varies substantially: MDD is the most common diagnosis (44.3%), followed by DEP and ANX, while DYS is the least prevalent (4.7%).

2.2 Full Workflow Call

The following chunk shows the exact call used to generate the precomputed results stored in inst/extdata/results_bin.rds.

results_bin <- alsi_workflow(
  data     = ANR2,
  vars     = vars,
  B_pa     = 2000,
  B_boot   = 2000,
  seed     = 20260123
)

2.3 Load and Inspect Precomputed Results

results_bin <- readRDS(system.file("extdata", "results_bin.rds",
                                    package = "SEPA"))

2.4 Parallel Analysis

print(results_bin$pa)

The first three observed eigenvalues exceed their permutation-based 95th- percentile reference values, supporting retention of a K* = 3-dimensional MCA subspace. These three dimensions account for approximately 48% of total inertia.

2.5 Bootstrap Stability Diagnostics

print(results_bin$boot)
plot_subspace_stability(results_bin$boot)

Median principal angles are 2.77°, 6.94°, and 15.46° for Dimensions 1–3, all well below the 20° threshold. Tucker congruence coefficients range from phi = 0.978 to phi = 0.992. All three dimensions pass the dual criterion, yielding K* = 3.

2.6 ALSI Computation

print(results_bin$alsi)
summary(results_bin$alsi$alpha)

Variance weights are 0.4345, 0.2979, and 0.2676 for Dimensions 1–3. ALSI values range from 0.040 to 1.625 (M = 0.373, Mdn = 0.368).

2.7 Category Projections

plot_category_projections(
  results_bin$fit,
  K         = results_bin$K,
  alpha_vec = results_bin$alsi$alpha_vec,
  top_n     = 10
)

ADHD_1 carries the strongest projection (|p| = 2.07), followed by DYS_1, DEP_1, and PTSD_1.


3 Example 2: Ordinal Data

This example illustrates the alsi_workflow_ordinal() pipeline using the ten Extraversion items (E1–E10) from the Big Five Inventory (BFI; N = 500).

3.1 Data

BFI            <- read.csv(system.file("extdata",
                                        "BFI_Original_Ordinal_N500.csv",
                                        package = "SEPA"))
items          <- paste0("E", 1:10)
reversed_items <- c("E2", "E4", "E6", "E8", "E10")
head(BFI[, items])
freq_table <- sapply(BFI[, items], function(x) table(factor(x, 1:5)))
round(100 * freq_table / nrow(BFI), 1)

Response frequencies are well distributed across the 1–5 scale for all ten items, with no category falling below the 2% rare-category threshold.

3.2 Full Workflow Call

results_ord <- alsi_workflow_ordinal(
  data           = BFI,
  items          = items,
  reversed_items = reversed_items,
  scale_min      = 1L,
  scale_max      = 5L,
  n_permutations = 100,
  B_boot         = 1000,
  seed           = 12345
)

3.3 Load and Inspect Precomputed Results

results_ord <- readRDS(system.file("extdata", "results_ord.rds",
                                    package = "SEPA"))

3.4 Parallel Analysis

print(results_ord$pa_table)

The first four observed eigenvalues exceed their 95th-percentile reference values, supporting an initial K_PA = 4-dimensional solution.

3.5 Bootstrap Stability Diagnostics

print(results_ord$stability_table)
plot_subspace_stability(results_ord)

Dimensions 1–3 satisfy both stability thresholds simultaneously. Dimension 4 fails the angle criterion (median theta = 24.39° > 20°), yielding K* = 3. All 1,000 bootstrap resamples converged successfully (skipped = 0).

3.6 Ordinal ALSI Computation

print(results_ord)
cat("oALSI summary:\n")
print(summary(results_ord$ALSI_index))
cat("\noALSI (z-scored) summary:\n")
print(summary(results_ord$ALSI_z))

Variance weights for K* = 3 are 0.4815, 0.3307, and 0.1878. The ordinal ALSI distribution is slightly negatively skewed, ranging from -0.014 to 0.025 (Mdn = -0.001, M = 0.000).


4 Example 3: Continuous Data

This example illustrates the calsi_workflow() pipeline using N = 900 individuals assessed on p = 9 domain scores from the WAIS-IV and WMS-IV cognitive batteries.

4.1 Data

wawm4   <- read.csv(system.file("extdata", "wawm4.csv", package = "SEPA"))
domains <- c("VC", "PR", "WO", "PS", "IM", "DM", "VWM", "VM", "AM")
X       <- wawm4[, domains]
cat("N =", nrow(X), " p =", ncol(X), "\n")

Domain means ranged from approximately 99 to 101 and standard deviations from approximately 14 to 16, consistent with the standard score metric (normative M = 100, SD = 15). Row-mean-centering is applied internally by calsi_workflow().

4.2 Full Workflow Call

results_cont <- calsi_workflow(
  data       = X,
  B_pa       = 2000,
  B_boot     = 2000,
  q          = 0.95,
  seed       = 20260206,
  K_override = 4
)

4.3 Load and Inspect Precomputed Results

results_cont <- readRDS(system.file("extdata", "results_cont.rds",
                                     package = "SEPA"))

4.4 Parallel Analysis

print(results_cont$pa)

Horn’s parallel analysis supported retention of four dimensions, accounting for approximately 78.28% of total variance in the row-mean-centered solution.

4.5 Bootstrap Stability Diagnostics

print(results_cont$stability_table)
plot_subspace_stability(results_cont)

All four dimensions satisfy both stability thresholds (median principal angles 0.13°-10.37°, all < 20°; Tucker congruence 0.987-0.999, all >= 0.95), yielding K* = 4.

4.6 Continuous ALSI Computation and Domain Contributions

print(results_cont)
print(results_cont$domain_contrib)

Variance weights for K* = 4 are 0.3833, 0.2481, 0.2222, and 0.1465. cALSI values range from 1.58 to 32.53 (M = 11.81, Mdn = 10.96, SD = 5.09). Processing Speed (PS, 21.5%) contributes most to the retained profile subspace.

4.7 Comparison with SEPA Plane-Wise Summaries

sepa_comparison <- compare_sepa_calsi(
  fit = results_cont$boot$ref,
  K   = 4
)
print(sepa_comparison)

The correlation between cALSI and the SEPA combined index was r = 0.988, indicating near-equivalent rank ordering of individuals across approaches.


5 Session Information

sessionInfo()