Download a copy of the vignette to follow along here: manhattan_plots.Rmd
Manhattan plots can be quickly visualize the relationships between features and cluster solutions.
There are three main Manhattan plot variations provided in metasnf.
esm_manhattan_plot
Visualize how a set of cluster
solutions separate over input/out-of-model featuresmc_manhattan_plot
Visualize how representative
solutions from defined meta clusters separate over input/out-of-model
featuresvar_manhattan_plot
Visualize how one raw feature
associates with other raw features (similar to
assoc_pval_heatmap
)The example below is taken from the “complete example” vignette.
library(metasnf)
# Start by making a data list containing all our data frames to more easily
# identify observations without missing data
full_dl <- data_list(
list(subc_v, "subcortical_volume", "neuroimaging", "continuous"),
list(income, "household_income", "demographics", "continuous"),
list(pubertal, "pubertal_status", "demographics", "continuous"),
list(anxiety, "anxiety", "behaviour", "ordinal"),
list(depress, "depressed", "behaviour", "ordinal"),
uid = "unique_id"
)
# Partition into a data and target list (optional)
dl <- full_dl[1:3]
target_dl <- full_dl[4:5]
# Build space of settings to cluster over
set.seed(42)
sc <- snf_config(
dl = dl,
n_solutions = 20,
min_k = 20,
max_k = 50
)
# Clustering
sol_df <- batch_snf(dl, sc)
# Calculate p-values between cluster solutions and features
ext_sol_df <- extend_solutions(
sol_df,
dl = dl,
target = target_dl,
min_pval = 1e-10 # p-values below 1e-10 will be thresholded to 1e-10
)
esm_manhattan_plot
)esm_manhattan <- esm_manhattan_plot(
ext_sol_df[1:5, ],
neg_log_pval_thresh = 5,
threshold = 0.05,
point_size = 3,
jitter_width = 0.1,
jitter_height = 0.1,
plot_title = "Feature-Solution Associations",
text_size = 14,
bonferroni_line = TRUE
)
A bit of an unwieldy plot if you try looking at too many solutions at a time, but it can be handy if you intend on just examining a few cluster solutions.
mc_manhattan_plot
)The mc_manhattan_plot
function can be used after meta
clustering to more efficiently examine the entire space of generated
cluster solutions.
# Calculate pairwise similarities between cluster solutions
sol_aris <- calc_aris(sol_df)
# Extract hierarchical clustering order of the cluster solutions
meta_cluster_order <- get_matrix_order(sol_aris)
# Create a base heatmap for visual meta clustering
ari_hm <- meta_cluster_heatmap(
sol_aris,
order = meta_cluster_order
)
# Identify meta cluster boundaries
# This can also be by trial & error if you do not wish to use the shiny app.
shiny_annotator(ari_hm)
# Result of meta cluster examination
split_vec <- c(2, 5, 12, 16)
# Create a base heatmap for visual meta clustering
ari_hm <- meta_cluster_heatmap(
sol_aris,
order = meta_cluster_order,
split_vector = split_vec
)
ari_hm
# Label meta clusters based on the split vector
mc_sol_df <- label_meta_clusters(
sol_df = ext_sol_df,
split_vector = split_vec,
order = meta_cluster_order
)
# Extracting representative solutions from each defined meta cluster
rep_solutions <- get_representative_solutions(sol_aris, mc_sol_df)
mc_manhattan <- mc_manhattan_plot(
rep_solutions,
dl = dl,
target_dl = target_dl,
point_size = 3,
text_size = 12,
plot_title = "Feature-Meta Cluster Associations",
threshold = 0.05,
neg_log_pval_thresh = 5
)
You can also visualize associations with a specific feature of interest rather than cluster solutions.
The only thing needed for this plot is a data_list - no clustering necessary.
var_manhattan <- var_manhattan_plot(
dl,
key_var = "household_income",
plot_title = "Correlation of Features with Household Income",
text_size = 16,
neg_log_pval_thresh = 3,
threshold = 0.05
)