
TidyPanel is an industrial-grade parser designed to
extract clean, standardized data frames from heavily malformed,
human-readable Excel reports. If you have ever struggled to parse
financial statements, ERP exports, or complex tables with N-dimensional
headers, decoy rows, and embedded subtotals, TidyPanel is
built for you.
You can install the development version of TidyPanel from GitHub with:
# install.packages("devtools")
devtools::install_github("TonyL/TidyPanel")Real-world commercial data is rarely tidy. It contains multi-line merged headers, embedded subtotal rows, empty “ghost” columns, and categorical hierarchies defined purely by visual indentation (e.g., in a Profit & Loss statement).
TidyPanel uses a multi-phase heuristic engine to: 1.
Bypass Decoy Rows: Skips irrelevant metadata at the top
of the sheet. 2. N-Dimensional Header Stitching:
Intelligently identifies multiline headers and forward-fills merged
cells to create flat, readable column names. 3. Indentation
Hierarchy Extraction: Uses leading spaces to identify
parent-child categorical relationships and creates a
parent_category column. 4. Smart Data
Amputation: Automatically removes decorative page breaks,
mid-table subtotals, and ghost aggregate rows. 5. Auto
Pivot: Detects temporal columns (e.g., Q1, 2021, FY23) and
pivots them into a tidy, long format. 6. Semantic
Cleaners: Automatically detects and normalizes accounting
dashes, non-standard scientific notation, and financial multipliers
(e.g., “1.5M” -> 1500000).
Data parsing should never be a “black box”. TidyPanel features a
unique Audit Log that explicitly records every
transformation applied to your data. By setting
return_audit = TRUE, you get both the cleaned data and a
precise ledger of what was modified.
library(TidyPanel)
# Read data and get an audit trail
result <- read_messy_panel("data_raw/financial_report.xlsx", return_audit = TRUE)
# View the audit trail
print(result$audit)=== Algorithm Modification Audit Log ===
Operation Count
1 Decoy Rows Bypassed 5
2 Indentation Hierarchy Extracted 3
3 Ghost Bottom Rows Dropped 4
MIT © Tony Lu