This article aims to replicate the case study developed by Araujo et al. (in review). We will also present the workflow executed within the study for both a single passerine phylogeny and multiple passerine phylogenies, in order to control the results for phylogenetic uncertainty Rangel et al. (2015).
Before starting our analysis, we’ll need to load the
ape
, ggplot2
, and ggpubr
packages. Additionally, if required, install the devtools
to invoke the treesliceR
package:
# Loading the packages that we'll use
library(ape)
library(ggplot2)
library(ggpubr)
Now, we’ll need to load (and install if necessary) the
treesliceR
package:
# Loading it
library(treesliceR)
Herein, we’ll compare the passerines tip diversification rates (DR) before and after the Australian aridification (~33 Mya). Our used passerines phylogenies are subsets of the global birds phylogenies made available by Jetz et al. (2012).
Thus, primarily, we can load the passerines phylogeny available
within the package by calling the pass_trees
object:
<- pass_trees[[1]] tree
Now, we need to split the phylogeny into two distinct portions, one
representing the last 33 millions of years ago (Mya) and another
representing the cladogenesis events before this period. To capture the
root slice before 33 Mya, we can slice our phylogeny tipwardly (i.e.,
from tips to root) using the squeeze_tips()
function.
Alternatively, the phylogeny slice of the last 33 Mya can be obtained
through the squeeze_root()
function, which slices a
phylogeny rootwardly (i.e, from root to tips).
Since DR calculations are dependent on the node’s path to a given
tip, we set the argument dropNodes = TRUE
in our functions
to remove those “void nodes” (nodes with zero branch length) to avoid
biases in our DR estimates:
<- squeeze_root(tree = tree, time = 33, dropNodes = T)
recent <- squeeze_tips(tree = tree, time = 33, dropNodes = T) old
Let’s compare our original phylogeny with those cut ones:
<- par(mfrow = c(1, 3)) # Setting an 1x3 graphical display
oldpar plot(tree, main = "Complete tree", show.tip.label = F); axisPhylo()
plot(old, main = "Old tree", show.tip.label = F); axisPhylo()
plot(recent, main = "Recent tree", show.tip.label = F); axisPhylo()
par(oldpar) # Returning to the original display
Then, we need to calculate the DR separately for each phylogeny slice
while finding the difference between them. To calculate the DR, we can
use the DR()
function available within the
treesliceR
package:
<- DR(tree = recent)[,2] - DR(tree = old)[,2] DR_diff
Now we’ll assign these DR differences to a data frame inside our tree object containing passerines information:
$Species_info$DR_diff <- DR_diff tree
To calculate the mean and standard deviation differences per family,
we use the tapply()
function to aggregate the calculated
DRs by passerine families:
# tapply() for means
<- tapply(tree$Species_info$DR_diff, tree$Species_info$Family, mean)
fam_DR # tapply() for standard deviations
<- tapply(tree$Species_info$DR_diff, tree$Species_info$Family, sd) fam_DR_sd
To visualize DR differences among families, we’ll create a new data frame with our outputs:
# Creating the families DR data frame
<- data.frame(Family = names(fam_DR), DR_diff = fam_DR, DR_sd = fam_DR_sd)
fam_df # Sorting them based on DR's value
<- fam_df[order(fam_df$DR_diff),]
fam_df # Turning this order into a factor to plot it
$Family <- factor(fam_df$Family, levels = fam_df$Family) fam_df
Now, we can create a graph similar to the one displayed in Araujo et
al. (in review) using the ggplot2
package:
ggplot(fam_df, aes(x = Family, y = DR_diff,
ymin = DR_diff - DR_sd,
ymax = DR_diff + DR_sd)) +
geom_pointrange(color = "#d90429") +
geom_hline(yintercept = 0, linetype="dashed", color = "black") +
coord_flip() + theme_minimal() +
theme(axis.title = element_text(size = 13),
axis.text = element_text(size = 10),
axis.line = element_line(colour = "black"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
ylab(expression(paste(DR["recent"]-DR["past"]))) + xlab(NULL)
Notice that the final output is slightly different from what was observed in Araujo et al. (in review). However, remember that here we executed the analysis only for a single passerine phylogeny. A more comprehensive assessment using all phylogenies could be carried out in the sections below (for example in the section “2. Passerines study for multiple phylogenies”).
Herein, we’ll calculate the CpB-rate of Australian passerines for
both turnover and nestedness components. Firstly,
we’ll need to load the assemblages containing the species matrix stored
within the package by calling the internal object pass_mat
.
Additionally, to calculate the beta-diversity metrics, it is necessary
to provide the adjacency matrix containing the focal cells and their
respective neighborhoods, which can be accessed through the
AU_adj
object. Let’s examine the header of the assemblage
matrix containing the species, focusing on the first four columns (or
species):
head(pass_mat[, 1:4])
#> Heteromyias_albispecularis Myzomela_obscura Taeniopygia_guttata
#> 1 0 1 0
#> 2 0 1 0
#> 3 0 1 0
#> 4 0 1 0
#> 5 0 1 0
#> 6 0 1 0
#> Dicaeum_hirundinaceum
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
We can run some sensitivity analysis to find the most parsimonious number of slices to assess the CpB-rate patterns. But first, we need to create a vector containing our desired number of slices for assessment:
<- c(250, 500, 750, 1000, 1250, 1500, 1750, 2000) vec
Let’s run the sensitivity analysis for both turnover and nestedness components:
<- CpR_sensitivity(tree = tree, vec = vec, samp = 100,
sens_turn mat = pass_mat, adj = AU_adj, rate = "CpB", comp = "turnover")
<- CpR_sensitivity(tree = tree, vec = vec, samp = 100,
sens_nest mat = pass_mat, adj = AU_adj, rate = "CpB", comp = "nestedness")
So, we can visualize our sensitivity analysis using the
CpR_sensitivity_plot
function. We use the
ggplot2
syntax to add a vertical line showing our selected
number of slices to run our subsequent analysis:
# Store each graph within a respective object
<- CpR_sensitivity_plot(sens_turn, rate = "CpB", stc = "mean") +
turn_sens_plot geom_vline(xintercept = 1000, linetype="dashed", color = "black")
<- CpR_sensitivity_plot(sens_nest, rate = "CpB", stc = "mean") +
nest_sens_plot geom_vline(xintercept = 1000, linetype="dashed", color = "black")
# To plot them together
ggarrange(turn_sens_plot, nest_sens_plot,
labels = c("a)", "b)"), ncol = 2, nrow = 1)
Now, we can finally calculate the CpB-rates for turnover and nestedness components, in this case, under a multisite approach (PS: this may take a few minutes):
# For turnover component
<- CpB(tree = tree, n = 1000, mat = pass_mat, adj = AU_adj, comp = "turnover")
turn # For nestedness component
<- CpB(tree = tree, n = 1000, mat = pass_mat, adj = AU_adj, comp = "nestedness") nest
Finally, we can plot these CpB over time and map them. To map them,
we’ll use an Australian grid map stored within our package in the object
AU_grid
. Let’s plot these patterns for the turnover
component:
<- CpR_graph(data = turn, rate = "CpB", qtl = TRUE)
turn_1 <- CpR_graph(data = turn, rate = "CpB", qtl = TRUE, map = AU_grid)
turn_2 # To plot them together
ggarrange(turn_1, turn_2,
labels = c("a)", "b)"), ncol = 2, nrow = 1)
And we can do the same for the nestedness component:
<- CpR_graph(data = nest, rate = "CpB", qtl = TRUE)
nest1 <- CpR_graph(data = nest, rate = "CpB", qtl = TRUE, map = AU_grid)
nest2 # To plot them together
ggarrange(nest1, nest2,
labels = c("a)", "b)"), ncol = 2, nrow = 1)
Herein, we’ll do the same as we have done in the previous section but using multiple (100) phylogenies. We’ll use all these phylogenies to control for phylogenetic uncertainty present in the position of inputted tips.
Thus, let’s firstly cut our 100 phylogenies to compare their tip-DR
before and after Australia aridification (~33Mya). This can be done
easily for our list of trees using the lapply()
function:
<- lapply(tree, function(x){return(squeeze_root(x, 33, dropNodes = T))})
recent <- lapply(tree, function(x){return(squeeze_tips(x, 33, dropNodes = T))}) old
In the same way, we can calculate their species DR separately using the same function:
<- lapply(recent, function(x){DR(x)})
DR_rec <- lapply(old, function(x){DR(x)}) DR_old
Once the DRs are calculated, we can capture their mean values per species. However, since species positions are generally different among the phylogenies, we’ll need to merge the lists based on the species column:
# Recent
<- DR_rec[[1]]
f_DRrec colnames(f_DRrec)[2] <- 1
# Old
<- DR_old[[1]]
f_DRold colnames(f_DRold)[2] <- 1
# Looping
for (i in 2:length(DR_rec)) {
# Recent
<- merge(f_DRrec, DR_rec[[i]], by = "Species", sort = FALSE)
f_DRrec colnames(f_DRrec)[i + 1] <- i
# Old
<- merge(f_DRold, DR_old[[i]], by = "Species", sort = FALSE)
f_DRold colnames(f_DRold)[i + 1] <- i
}
Now, we can calculate the mean and standard deviation of DRs per passerine species for each phylogeny time slice. Then, calculate their mean differences as done in the first workflow for one phylogeny:
# Recent
<- apply(f_DRrec[, -1], 1, mean)
DR_rec_mean <- apply(f_DRrec[, -1], 1, sd)
DR_rec_sd
# Old
<- apply(f_DRold[, -1], 1, mean)
DR_old_mean <- apply(f_DRold[, -1], 1, sd)
DR_old_sd
# Capturing their mean difference
<- DR_rec_mean - DR_old_mean DR_diff
Again, we can merge these DR differences into a data frame inside our
tree object containing passerine information, and then obtain this
information per passerine family using the tapply()
function:
<- data.frame(Specie = f_DRrec[, 1], DR_diff = DR_diff)
df # Capturing families information
<- merge(df, tree[[1]]$Species_info, by = "Specie", sort = FALSE)
df
## Displaying it graphically for families
<- tapply(df$DR_diff, df$Family, mean)
fam_DR <- tapply(df$DR_diff, df$Family, sd) fam_DR_sd
Before plotting, we’ll create a new data frame containing our DRs per family and sort it based on the highest DR values:
<- data.frame(Family = names(fam_DR), DR_diff = fam_DR, DR_sd = fam_DR_sd)
fam_df # Ordering to plot
<- fam_df[order(fam_df$DR_diff),]
fam_df $Family <- factor(fam_df$Family, levels = fam_df$Family) fam_df
Plotting it!
ggplot(fam_df, aes(x = Family, y = DR_diff,
ymin = DR_diff - DR_sd,
ymax = DR_diff + DR_sd)) +
geom_pointrange(color = "#d90429") +
geom_hline(yintercept = 0, linetype="dashed", color = "black") +
coord_flip() + theme_minimal() +
theme(axis.title = element_text(size = 13),
axis.text = element_text(size = 10),
axis.line = element_line(colour = "black"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
ylab(expression(paste(DR["recent"]-DR["past"]))) + xlab(NULL)
Once we have already decided on the number of slices in the previous sections (in section “1.2.”), we’ll skip the sensitivity analysis step to calculate the CpB-rates. Thus, here we’ll calculate the CpB-rate for the 100 phylogenies using our previously defined criterion of 1000 slices.
First, we’ll run the CpB-rate analysis for the
turnover component. To obtain our CpB-rates faster for
our 100 phylogenies, pay attention that we set the CpB()
function to run under parallel programming (using the argument
ncor
). Before run it, you must confirm if
the number of cores set (5) is supported by your machine. Anyway, the
following function can may take some minutes to finish, but it can run
faster if you are able to set a higher number of cores:
<- lapply(pass_trees, function(x){
CpB_turn return(CpB(tree = x, n = 1000, mat = pass_mat, adj = AU_adj, comp = "turnover", ncor = 5))
})
Now, we capture can capture the mean of each parameter outputted in
on our list of CpB()
outputs using the
sapply()
function. We’ll store them within a new data frame
containing the format that our CpR_graph()
is able to read
(which is the same as the CpB()
output):
<- sapply(CpB_turn, function(x){return(x[,1])})
CpB_val <- sapply(CpB_turn, function(x){return(x[,2])})
pB_val <- sapply(CpB_turn, function(x){return(x[,3])})
pBO_val # Creating the new data frame
<- data.frame(CpB = apply(CpB_val, 1, mean),
turn_100trees pB = apply(pB_val, 1, mean),
pBO = apply(pBO_val, 1, mean))
Plotting the CpB outputs for the turnover component:
<- CpR_graph(data = turn_100trees, rate = "CpB", qtl = TRUE)
turn_1 <- CpR_graph(data = turn_100trees, rate = "CpB", qtl = TRUE, map = AU_grid)
turn_2 # To plot them together
ggarrange(turn_1, turn_2,
labels = c("a)", "b)"), ncol = 2, nrow = 1)
Now, we’ll do the same for the nestedness component:
<- lapply(pass_trees, function(x){
CpB_nest return(CpB(tree = x, n = 1000, mat = pass_mat, adj = AU_adj, comp = "nestedness", ncor = 5))
})
Separating and summarizing each parameter:
<- sapply(CpB_nest, function(x){return(x[,1])})
CpB_val <- sapply(CpB_nest, function(x){return(x[,2])})
pB_val <- sapply(CpB_nest, function(x){return(x[,3])})
pBO_val # DF
<- data.frame(CpB = apply(CpB_val, 1, mean),
nest_100trees pB = apply(pB_val, 1, mean),
pBO = apply(pBO_val, 1, mean))
Plotting the CpB outputs for the nestedness component:
<- CpR_graph(data = nest_100trees, rate = "CpB", qtl = TRUE)
nest_1 <- CpR_graph(data = nest_100trees, rate = "CpB", qtl = TRUE, map = AU_grid)
nest_2 # To plot them together
ggarrange(nest_1, nest_2,
labels = c("a)", "b)"), ncol = 2, nrow = 1)
That’s all folks!
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K. and Mooers, A. O. 2012. The global diversity of birds in space and time. - Nature 491(7424): 444–448. https://doi.org/10.1038/nature11631
Rangel, T. F., Colwell, R. K., Graves, G. R., Fučíková, K., Rahbek, C. and Diniz-Filho, J. A. F. 2015. Phylogenetic uncertainty revisited: Implications for ecological analyses. - Evolution 69(5): 1301–1312. https://doi.org/10.1111/evo.12644