---
title: "Chapter 16: Large models — GPU acceleration using OpenCL"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Chapter 16: Large models — GPU acceleration using OpenCL}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

GPU acceleration is an **optional** feature of `glmbayes`. All modeling
functions --- `glmb()`, `lmb()`, `rglmb()`, and related tools --- run fully on
the CPU regardless of whether OpenCL is available. No setup is needed for
standard use.

Where GPU acceleration pays off is with **large models**: high-dimensional
predictor sets or large posterior sample sizes. The computationally intensive
work in `glmbayes` is envelope construction and evaluation --- the gradient and
log-posterior calculations at each point of the tangency grid grow with model
dimension and are embarrassingly parallel. Dispatching them to a GPU with
`use_opencl = TRUE` can substantially reduce wall time for these cases. See
**Chapter A10** for a technical explanation of what is accelerated and why.

This chapter describes how to enable GPU acceleration. The process closely
resembles a source install of any compiled R package: the only extra step is
ensuring that the **'opencltools'** dependency is in an OpenCL-ready state
before the source install.

# What you see when you load glmbayes

When `glmbayes` is loaded in an interactive session it checks, silently, whether
GPU acceleration appears feasible. If `has_opencl()` is already `TRUE` --- meaning
this build was compiled with OpenCL support --- attach is completely silent.

If `has_opencl()` is `FALSE` **and** the package detects a GPU or OpenCL stack
on the host, you will see a message like:

```
Note: glmbayes provides full CPU capability in this session
(e.g. glmb(), lmb(), Prior_Setup()). GPU acceleration is recommended
for bigger models and appears available. Reinstall glmbayes from source
with OpenCL at compile time to enable it; see vignette("Chapter-16",
"glmbayes") for install instructions.
```

On a machine with no GPU and no OpenCL stack, attach is silent --- the CPU-only
install is entirely appropriate and no action is needed.

To suppress the message in scripts or automated workflows:

```r
options(glmbayes.quiet_opencl_startup = TRUE)
```

# Enabling GPU acceleration: three steps

Work through these steps in order. After each step you can check whether you
are done and skip the rest.

## Step 1: Check whether OpenCL is already enabled

```r
library(glmbayes)
has_opencl()
```

If this returns `TRUE`, GPU acceleration is already compiled in. Pass
`use_opencl = TRUE` to `glmb()` and you are done. Otherwise continue to
Step 2.

## Step 2: Ensure 'opencltools' is OpenCL-ready

`opencltools` is installed automatically as a dependency of `glmbayes`. It
provides the host diagnostics and runtime checks that `glmbayes` relies on. For
GPU acceleration to work in `glmbayes`, `opencltools` must itself be built with
OpenCL support.

Check:

```r
opencltools::has_opencl()
```

If this returns `FALSE`, follow
**`vignette("Chapter-01", package = "opencltools")`** to install the required
OpenCL components (GPU driver, headers, ICD loader) for your platform and
reinstall `opencltools` from source. The `opencltools` Chapter 01 vignette is
the maintained home for per-OS installation instructions and keeps them current.

For a host-level diagnostic that does not depend on the `glmbayes` build state:

```r
opencltools::diagnose_glmbayes()
```

Once `opencltools::has_opencl()` returns `TRUE`, proceed to Step 3.

**What you need on your system** (brief summary; details in 'opencltools' Chapter 01):

| Component | What it provides | Needed for |
|-----------|-----------------|------------|
| GPU driver | Exposes hardware to the OS | Runtime |
| OpenCL headers (`CL/cl.h`) | Required at compile time | Source build |
| OpenCL ICD loader (`OpenCL.dll` / `libOpenCL.so`) | Dispatches to vendor runtime | Runtime |

All three must be present. The most common failure mode is having the driver
but not the headers, or the headers but not the ICD loader.

## Step 3: Reinstall glmbayes from source

With the OpenCL environment confirmed, reinstall `glmbayes` from source. The
`configure` / `configure.win` script runs automatically, detects the OpenCL
headers and library, and sets `-DUSE_OPENCL` if everything is found.

### Windows

Windows users typically need **`devtools`** (or `remotes`) for source installs.
Install it first if you do not have it:

```r
install.packages("devtools")
```

Then install `glmbayes` from source. From CRAN with source compilation:

```r
install.packages("glmbayes", type = "source")
```

Or from GitHub if you need a development version:

```r
devtools::install_github("knygren/glmbayes")
```

Rtools must be installed and on your `PATH`. If you have not yet installed
Rtools, follow the prompt at <https://cran.r-project.org/bin/windows/Rtools/>.

### Linux / macOS

```r
install.packages("glmbayes", type = "source")
```

On macOS, Xcode Command Line Tools and GCC (via Homebrew) are required; see
**`vignette("Chapter-01", package = "opencltools")`** for details.

### After the install

Confirm the build succeeded:

```r
library(glmbayes)
has_opencl()
#> [1] TRUE
```

# Verifying the setup

Once `has_opencl()` returns `TRUE`, run a full diagnostic to confirm the
complete stack:

```r
diagnose_glmbayes()
```

A clean report looks like:

```
=== glmbayes OpenCL Diagnostic Report ===
Environment: linux

GPU: NVIDIA
  [OK] Driver installed
  [OK] OpenCL headers found (CL/cl.h)
  [OK] OpenCL runtime found (OpenCL.dll / ICD)
  [OK] OpenCL fully available (headers + runtime)
  [OK] Required PATH and library dirs present
  [OK] OpenCL runtime probe succeeded (platform available)

[OK] glmbayes was compiled with OpenCL support.

=== End of Diagnostic Report ===
```

Each line reports one layer of the stack. If any line shows `[FAIL]` or
`[WARN]`, the report indicates what is missing. Common resolutions:

- **Driver not installed** → install or update your GPU vendor driver.
- **Headers not found** → install the OpenCL SDK; see 'opencltools' Chapter 01.
- **Runtime not found** → install the ICD loader (`ocl-icd-libopencl1` on Linux, included with the CUDA Toolkit on Windows).
- **Runtime probe failed** (`CL_PLATFORM_NOT_FOUND_KHR`) → the ICD loader is present but no vendor platform is registered. On Linux, run `clinfo` outside R to check for visible platforms, and ensure the vendor ICD file is in `/etc/OpenCL/vendors/`.
- **glmbayes not compiled with OpenCL** → the source install did not find OpenCL at compile time; check `opencltools::has_opencl()` and retry Step 2.

On Windows, the Linux/WSL runtime probe step is skipped; rely on the driver
and ICD checks instead.

For PATH-related warnings on Windows (CUDA Toolkit bin directory not in PATH),
the diagnostic report lists the missing entries. Fix them via system settings or
your shell profile; advanced users may use the helpers in `opencltools` directly
(see `?opencltools::add_to_path`).

# Running a GPU-accelerated model

Once set up, pass `use_opencl = TRUE` to `glmb()` or `rglmb()`:

```r
example(Cleveland)
```

The built-in Cleveland example runs a CPU vs OpenCL comparison and is a
convenient end-to-end test. The chunks below illustrate the pattern
(not executed during the vignette build):

```{r, eval=FALSE}
library(glmbayes)
data("Cleveland")

ps <- Prior_Setup(
  hd ~ age + sex + cp + trestbps + chol +
    fbs + restecg + thalach + exang + oldpeak + slope + ca + thal,
  family = binomial(logit),
  data = Cleveland
)

t_cpu <- system.time({
  fit_cpu <- glmb(
    hd ~ age + sex + cp + trestbps + chol +
      fbs + restecg + thalach + exang + oldpeak + slope + ca + thal,
    family       = binomial(link = "logit"),
    pfamily      = dNormal(mu = ps$mu, Sigma = ps$Sigma),
    data         = Cleveland,
    n            = 20000,
    Gridtype     = 2,
    use_parallel = TRUE,
    use_opencl   = FALSE,
    verbose      = FALSE
  )
})

t_gpu <- system.time({
  fit_gpu <- glmb(
    hd ~ age + sex + cp + trestbps + chol +
      fbs + restecg + thalach + exang + oldpeak + slope + ca + thal,
    family       = binomial(link = "logit"),
    pfamily      = dNormal(mu = ps$mu, Sigma = ps$Sigma),
    data         = Cleveland,
    n            = 20000,
    Gridtype     = 2,
    use_parallel = TRUE,
    use_opencl   = TRUE,
    verbose      = FALSE
  )
})

t_cpu
t_gpu
```

```{r, echo=FALSE, out.width="100%"}
knitr::include_graphics(
  system.file("extdata", "cleveland_non_opencl_output_01.png", package = "glmbayes")
)
```

```{r, echo=FALSE, out.width="100%"}
knitr::include_graphics(
  system.file("extdata", "cleveland_opencl_output_01.png", package = "glmbayes")
)
```

```{r, eval=FALSE}
summary(fit_gpu)
```

```{r, echo=FALSE, out.width="100%"}
knitr::include_graphics(
  system.file("extdata", "cleveland_summary_output_01.png", package = "glmbayes")
)
knitr::include_graphics(
  system.file("extdata", "cleveland_summary_output_02.png", package = "glmbayes")
)
```

The GPU path gives the same posterior results as the CPU path; only timing
differs. GPU gains are most visible with larger models (more predictors, larger
`n`, higher-dimensional tangency grids).

# Appendix A: AMD GPUs on Linux (ROCm OpenCL)

AMD provides multiple OpenCL implementations on Linux, but only **ROCm OpenCL**
is fully supported and stable. If you are using an AMD GPU, install ROCm OpenCL
on **Ubuntu 22.04 or 24.04 LTS**:

```sh
sudo apt-get install rocm-opencl-runtime
```

This installs the AMD OpenCL runtime, the ICD file (`amdocl64.icd`), and ROCm's
optimized OpenCL implementation.

**Supported AMD GPUs** (ROCm):

- Radeon RX 7900 XTX / XT / GRE
- Radeon RX 7800 XT / 7700 XT
- Radeon Pro W7900 / W7800 / W7700
- Instinct MI200 / MI300 accelerators

Older GPUs (Polaris, Vega, Navi 1x/2x) are **not supported** by ROCm. Mesa
Rusticl is a community alternative that may work but is not officially
supported. AMDGPU-PRO OpenCL is legacy and not recommended.

For full per-distribution instructions and verification steps, see
**`vignette("Chapter-01", package = "opencltools")`**.