Faster (and exact) phylogenetic diversity computations in R

Submitted by editor on 9 October 2015. Get the paper!

By Constantinos Tsirogiannis and Brody Sandel

Some species assemblages represent a narrow section of the tree of life, while others include a broader swath. This difference can be quantified with phylogenetic diversity measures, which describe the diversity of an assemblage according to the distances among its members on a phylogenetic tree. These measures are useful in conservation prioritization and in basic ecology.

Often, it is useful to standardize a phylogenetic diversity value against the species richness of an assemblage. For example, the widely used Net Relatedness Index (NRI) standardizes the Mean Pairwise Distance (MPD) measure according to the expected mean and standard deviation of MPD, given species richness and the tree shape. Traditionally, this standardization is performed by comparing the observed value against a set of values obtained by Monte Carlo randomizations.

Unfortunately, on large phylogenetic trees (say, with >5000 species), these approaches are slow and inexact. So, we began working on designing efficient algorithms that can perform this standardization both exactly and fast (Tsirogiannis et al. 2012, 2014). These algorithms are now available in PhyloMeasures – as both C++ and R packages. Using this package, we can calculate richness-standardized phylogenetic diversity measures in a few seconds, even for trees with >100,000 tips.

PhyloMeasures contains functions to calculate both the basic and richness-standardized versions of phylogenetic diversity (PD), MPD, the Mean Nearest Taxon Distance (MNTD), the Core Ancestor Cost (CAC), and two-sample versions of most of these, such as the Common Branch Length (CBL) and Community Distance (CD). Future developments will include allowing unequal species frequencies in the standardization procedures.

We expect that these tools will make research more efficient. More than that, we hope that they inspire ecologists to ask new kinds of questions. For example, because these computations are so fast on single trees, it is now feasible to repeat them across a large family of trees to better understand the influence of phylogenetic uncertainty (for example Barnagaud et al. 2014).



Barnagaud, Jean‐Yves, W. Daniel Kissling, Brody Sandel, Wolf L. Eiserhardt, Çağan H. Şekercioğlu, Brian J. Enquist, Constantinos Tsirogiannis, and Jens‐Christian Svenning. 2014. Ecological traits influence the phylogenetic structure of bird species co‐occurrences worldwide. Ecology Letters 17: 811-820

Tsirogiannis, C., B. Sandel and D. Cheliotis. 2012. Efficient computation of popular phylogenetic tree measures. Lecture Notes in Computer Science 7534: 30-43.

Tsirogiannis, C., B. Sandel, Kalvisa, A. 2014. New algorithms for computing phylogenetic biodiversity. Lecture Notes in Computer Science 8701: 187-203.