NRMstatsML is an R package for statistical and machine learning-based analysis of long-term Natural Resource Management (NRM) datasets. It is designed for datasets spanning 10–20 years covering soil health, water, crop yield, and climate variables.
The package is organised into seven core modules:
| Module | Functions | Purpose |
|---|---|---|
trendML |
nrm_trend, nrm_mann_kendall,
nrm_sens_slope, nrm_structural_break |
Trend detection and structural change |
multiSysML |
nrm_multivariate, nrm_pls,
nrm_sem |
Multivariate system modelling |
responseML |
nrm_response_curve,
nrm_optimize_input |
Yield-response and optimisation |
tsML |
nrm_forecast, nrm_arima |
Time series forecasting |
panelML |
nrm_panel, nrm_did |
Panel data and treatment effects |
uncertaintyML |
nrm_uncertainty, nrm_bootstrap,
nrm_monte_carlo |
Uncertainty quantification |
autoML |
nrm_automl, nrm_benchmark |
Automated model selection |
# From CRAN (once published)
install.packages("NRMstatsML")
# Development version from GitHub
# install.packages("remotes")
remotes::install_github("yourorg/NRMstatsML")The package ships with nrm_example, a synthetic 20-year
dataset that mimics a long-term fertiliser experiment.
library(NRMstatsML)
data(nrm_example)
head(nrm_example)
#> year treatment crop_yield soil_OC N P K rainfall runoff biomass
#> 1 2000 control 3.37 0.867 55.6 23.2 26.3 782.5 76.0 5.32
#> 2 2001 NPK 3.30 0.836 63.1 22.8 34.6 676.4 75.1 4.70
#> 3 2002 control 3.21 0.882 68.8 23.2 35.7 712.6 60.4 4.98
#> 4 2003 NPK 3.26 0.829 79.3 23.5 29.5 657.5 77.0 6.10
#> 5 2004 control 3.75 0.870 66.6 27.5 36.8 574.1 69.3 5.85
#> 6 2005 NPK 3.87 0.783 76.4 25.0 41.2 672.5 80.4 5.89Always start by validating the data:
nrm_data_check(nrm_example)
#> === NRMstatsML: Data Check ===
#> Rows : 20
#> Columns : 10
#> Numeric : year, crop_yield, soil_OC, N, P, K, rainfall, runoff, biomass
#> Status : OKtrend_res <- nrm_trend(nrm_example,
time_var = "year",
value_var = "crop_yield",
breaks = TRUE)
print(trend_res)
#> === NRMstatsML: Trend Analysis ===
#> Variable : crop_yield
#> Time index: year
#>
#> -- Mann-Kendall Test --
#> tau = 0.4105, p-value = 0.0125
#> Trend: increasing (alpha = 0.05)
#>
#> -- Sen's Slope --
#> Slope = 0.0500 (95% CI: 0.0150, 0.0767)
#>
#> -- Structural Breaks (Bai-Perron) --
#> 1 break(s) at: 2010nrm_plot(trend_res)mk <- nrm_mann_kendall(nrm_example, time_var = "year",
value_var = "soil_OC")
print(mk)
#> -- Mann-Kendall Test --
#> tau = 0.6000, p-value = 0.0002
#> Trend: increasing (alpha = 0.05)
ss <- nrm_sens_slope(nrm_example, time_var = "year",
value_var = "soil_OC")
print(ss)
#> -- Sen's Slope --
#> Slope = 0.0082 (95% CI: 0.0052, 0.0121)mv <- nrm_multivariate(nrm_example,
formula = crop_yield ~ N + P + K + rainfall,
scale = TRUE)
print(mv)
#> === NRMstatsML: Multivariate Regression ===
#> Formula : crop_yield ~ N + P + K + rainfall
#> Adj. R^2 : 0.2663
#> Scaled : TRUE
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.75150000 0.08088321 46.38169192 1.290760e-17
#> N -0.01451394 0.23920031 -0.06067693 9.524176e-01
#> P 0.20689436 0.18209712 1.13617591 2.737102e-01
#> K 0.08989712 0.23114178 0.38892629 7.027959e-01
#> rainfall 0.01982764 0.08441227 0.23489052 8.174700e-01PLS is preferred when predictors are highly collinear, which is common in NRM data where N, P, K, and rainfall are often correlated.
pl <- nrm_pls(nrm_example,
formula = crop_yield ~ N + P + K + rainfall + soil_OC,
ncomp = 3)
print(pl)
#> === NRMstatsML: PLS Regression ===
#> Formula : crop_yield ~ N + P + K + rainfall + soil_OC
#> Components: 2
#> RMSEP : 0.3763rc <- nrm_response_curve(nrm_example,
input_var = "N",
response_var = "crop_yield",
type = "quadratic")
print(rc)
#> === NRMstatsML: Response Curve ===
#> Type : quadratic
#> Input : N
#> Response : crop_yield
#> R-squared : 0.3446
nrm_plot(rc)Find the nitrogen rate that maximises profit given a cost-to-price ratio:
opt <- nrm_optimize_input(rc, price_ratio = 0.6)
print(opt)
#> === NRMstatsML: Input Optimisation ===
#> Price ratio : 0.600
#> Biophysical optimum : NA
#> Economic optimum : NAar <- nrm_arima(nrm_example, value_var = "crop_yield")
print(ar)
#> === NRMstatsML: ARIMA Model ===
#> Variable : crop_yield
#> Order : ARIMA(0,1,0)
#> AIC = 23.98 BIC = 24.93fc <- nrm_forecast(nrm_example,
value_var = "crop_yield",
horizon = 5)
print(fc)
#> === NRMstatsML: Time Series Forecast ===
#> Variable : crop_yield
#> Method : auto_arima
#> Horizon : 5 steps
#> Accuracy (in-sample):
#> ME RMSE MAE MPE MAPE MASE ACF1
#> Training set 0.0317 0.4206 0.2887 0.2445 7.5499 0.9506 -0.3701
nrm_plot(fc)Panel and DiD analyses require a multi-site, multi-year dataset. See
?nrm_panel and ?nrm_did for worked examples
with appropriate data.
# Requires a panel dataset with site and year identifiers
pm <- nrm_panel(panel_data,
formula = crop_yield ~ N + P + K + rainfall,
index = c("site", "year"),
model = "within")
nrm_summary(pm)
did <- nrm_did(panel_data,
outcome = "crop_yield",
treat_var = "treatment_binary",
time_var = "post_period")
print(did)bs <- nrm_bootstrap(nrm_example,
stat_fn = function(d) mean(d$crop_yield),
n_iter = 500)
print(bs)
#> -- Bootstrap --
#> Mean : 3.7513 SD : 0.0916 CI : [3.5735, 3.9325]mc <- nrm_monte_carlo(nrm_example,
stat_fn = function(d) mean(d$crop_yield),
n_iter = 500,
noise_sd = 0.1)
print(mc)
#> -- Monte Carlo --
#> Mean : 3.7508 SD : 0.0094 CI : [3.7325, 3.7685]Use nrm_uncertainty() as a unified entry point:
unc <- nrm_uncertainty(nrm_example,
stat_fn = function(d) mean(d$crop_yield),
method = "bootstrap",
n_iter = 500)
print(unc)
#> === NRMstatsML: Uncertainty Analysis ===
#> Method : bootstrap
#> Mean : 3.7558
#> SD : 0.0904
#> 95% CI : [3.5866, 3.9416]# Requires caret + method-specific packages (e.g. randomForest, gbm)
am <- nrm_automl(nrm_example,
formula = crop_yield ~ N + P + K + rainfall + soil_OC,
methods = c("lm", "rf", "gbm"),
cv_folds = 5,
seed = 42)
nrm_summary(am)n <- nrow(nrm_example)
train <- nrm_example[seq_len(floor(0.8 * n)), ]
test <- nrm_example[seq(floor(0.8 * n) + 1, n), ]
m_ols <- lm(crop_yield ~ N + P + K, data = train)
bm <- nrm_benchmark(
models = list(ols = m_ols),
test_data = test,
response_var = "crop_yield"
)
print(bm)nrm_data_check() # 1. Validate data
↓
nrm_trend() # 2. Detect trends and breaks
↓
nrm_multivariate() / nrm_pls() # 3. Multivariate modelling
↓
nrm_response_curve() # 4. Fit yield-response
nrm_optimize_input() # → Identify optimal inputs
↓
nrm_forecast() # 5. Forecast future values
↓
nrm_uncertainty() # 6. Quantify uncertainty
↓
nrm_automl() # 7. Compare candidate models
nrm_benchmark() # → Select best for deployment
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 26200)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.2 NRMstatsML_0.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] zoo_1.8-15 tidyselect_1.2.1 xfun_0.57
#> [4] bslib_0.10.0 urca_1.3-4 lattice_0.20-45
#> [7] colorspace_2.1-2 vctrs_0.7.2 generics_0.1.4
#> [10] htmltools_0.5.9 viridisLite_0.4.3 yaml_2.3.12
#> [13] rlang_1.1.7 isoband_0.3.0 jquerylib_0.1.4
#> [16] pillar_1.11.1 glue_1.7.0 withr_3.0.2
#> [19] forecast_9.0.2 RColorBrewer_1.1-3 S7_0.2.1
#> [22] lifecycle_1.0.5 timeDate_4052.112 pls_2.9-0
#> [25] gtable_0.3.6 evaluate_1.0.5 labeling_0.4.3
#> [28] knitr_1.51 strucchange_1.5-4 fastmap_1.2.0
#> [31] parallel_4.2.1 Rcpp_1.1.1 scales_1.4.0
#> [34] cachem_1.1.0 jsonlite_2.0.0 farver_2.1.2
#> [37] otel_0.2.0 fracdiff_1.5-3 Kendall_2.2.2
#> [40] digest_0.6.39 dplyr_1.2.0 trend_1.1.6
#> [43] grid_4.2.1 cli_3.6.5 tools_4.2.1
#> [46] sandwich_3.1-1 magrittr_2.0.3 sass_0.4.10
#> [49] tibble_3.3.1 pkgconfig_2.0.3 rmarkdown_2.31
#> [52] extraDistr_1.10.0.2 rstudioapi_0.18.0 R6_2.6.1
#> [55] boot_1.3-32 nlme_3.1-168 compiler_4.2.1