The goal of vsp is to enable fast, spectral estimation
of latent factors in random dot product graphs. Under mild assumptions,
the vsp estimator is consistent for (degree-corrected)
stochastic blockmodels, (degree-corrected) mixed-membership stochastic
blockmodels, and degree-corrected overlapping stochastic
blockmodels.
More generally, the vsp estimator is consistent for
random dot product graphs that can be written in the form
E(A) = Z B Y^T
where Z and Y satisfy the varimax
assumptions of [1]. vsp works on directed and undirected
graphs, and on weighted and unweighted graphs. Note that
vsp is a semi-parametric estimator.
You can install the released version of vsp from CRAN
with
install.packages("vsp")You can install the development version of vsp with:
install.packages("devtools")
devtools::install_github("RoheLab/vsp")Obtaining estimates from vsp is straightforward. We
recommend representing networks as igraph objects or sparse
adjacency matrices using the Matrix
package. Once you have your network in one of these formats, you can get
estimates by calling the vsp() function. The result is a
vsp_fa S3 object.
Here we demonstrate vsp usage on an igraph
object, using the enron network from
igraphdata package to demonstrate this functionality. First
we peak at the graph:
library(igraph)
data(enron, package = "igraphdata")
image(sign(get.adjacency(enron, sparse = FALSE)))
Now we estimate:
library(vsp)
fa <- vsp(enron, rank = 30)
fa
#> Vintage Sparse PCA Factor Analysis
#>
#> Rows (n): 184
#> Cols (d): 184
#> Factors (rank): 30
#> Lambda[rank]: 0.2077
#> Components
#>
#> Z: 184 x 30 [matrix]
#> B: 30 x 30 [matrix]
#> Y: 184 x 30 [matrix]
#> u: 184 x 30 [matrix]
#> d: 30 [numeric]
#> v: 184 x 30 [matrix]get_varimax_z(fa)
#> # A tibble: 184 × 31
#> id z01 z02 z03 z04 z05 z06 z07 z08
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 row0… 2.42e-4 -0.00245 -2.99e-2 3.37e-4 9.96e-5 -0.0114 -0.00849 0.502
#> 2 row0… -2.52e-3 0.00135 6.70e-4 -1.63e-1 -1.47e-2 0.0471 0.190 0.00181
#> 3 row0… 2.98e-4 -0.100 1.17e-4 -3.62e-3 -2.06e-2 0.187 -0.158 0.00303
#> 4 row0… -7.75e-5 -0.0183 1.17e-4 5.42e-2 -5.58e-3 0.00165 -0.0367 -0.00106
#> 5 row0… -2.31e-3 0.00150 2.57e-1 -1.42e-2 -4.38e-2 0.00629 1.18 -0.0179
#> 6 row0… -3.46e-2 -0.0527 -2.61e-2 -1.26e-2 -1.83e-2 0.0282 0.408 -0.0286
#> 7 row0… -1.08e-3 -0.327 -6.01e-1 -6.98e-2 -9.85e-2 -0.0709 0.509 0.0511
#> 8 row0… 1.58e-2 -0.0518 -1.34e-2 -1.03e-2 -4.12e-3 -0.0139 0.225 -0.0244
#> 9 row0… 2.22e-3 0.0752 3.30e-2 -6.50e-4 -5.00e-1 -0.0278 -0.0740 -0.00556
#> 10 row0… 7.13e-4 -0.0119 1.95e-2 -5.06e-3 -7.08e-3 0.00341 -0.00369 13.4
#> # ℹ 174 more rows
#> # ℹ 22 more variables: z09 <dbl>, z10 <dbl>, z11 <dbl>, z12 <dbl>, z13 <dbl>,
#> # z14 <dbl>, z15 <dbl>, z16 <dbl>, z17 <dbl>, z18 <dbl>, z19 <dbl>,
#> # z20 <dbl>, z21 <dbl>, z22 <dbl>, z23 <dbl>, z24 <dbl>, z25 <dbl>,
#> # z26 <dbl>, z27 <dbl>, z28 <dbl>, z29 <dbl>, z30 <dbl>To visualize a screeplot of the singular value, use:
screeplot(fa)
At the moment, we also enjoy using pairs plots of the factors as a diagnostic measure:
plot_varimax_z_pairs(fa, 1:5)
plot_varimax_y_pairs(fa, 1:5)
Similarly, an IPR pairs plot can be a good way to check for singular vector localization (and thus overfitting!).
plot_ipr_pairs(fa)
plot_mixing_matrix(fa)
[1] Rohe, Karl, and Muzhe Zeng. “Vintage Factor Analysis with Varimax Performs Statistical Inference.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85, no. 4 (September 29, 2023): 1037–60. https://doi.org/10.1093/jrsssb/qkad029.
Code to reproduce the results from the paper is available here.