Clustering or network methods? Comparing different methods for bioregionalisationSubmitted by editor on 15 February 2018. Get the paper!
Figure 1. Example of co-occurrence network based on presence data. a) Co-occurrence network with locations (green filled circles) and taxa (blue filled circles) and b) Binary matrix of species co-occurrences.
By Francisco Encinas-Viso
Understanding species distribution patterns across the globe is a continual quest for biogeographers. Its importance in conservation science is even more critical now in a rapidly changing world. The application of novel methods for casts new light on biogeographical patterns providing improvements on previous approaches of classification.
Uncovering biodiversity hotspots or detecting biogeographical regions (bioregionalisation) is a key component of such studies. Traditionally, the methods used are based on presence-data of species occurrences and clustering techniques of these data. For example, the software ‘Biodiverse’ (Laffan et al. 2010) uses the WPGMA (weighted pair-group method using arithmetic averages) clustering method and a distance or turnover metric compares the composition of taxa between cells to create a similarity tree or dendrogram.
These traditional clustering methods are not without problems however. A common issue arises when two cells are tied in their similarity value according to a chosen metric (e.g. Jaccard, Simpson). Implementing a tiebreaker can solve this (Gonzalez-Orozco et al. 2013), but in doing so, reduces the robustness of the result.
Recently, novel methods using network approaches have been providing another way to detect bioregions. In the network approach, a bipartite network is built where there are two types of nodes: locations and taxa, with the taxa linked to locations in which they are present (Fig. 1). This type of network is also known as a co-occurrence network. Different studies have used this approach to detect biogeographical regions (Carstensen and Olesen 2009, Carstensen et al. 2012) or small spatial scale networks (Encinas-Viso et al. 2016), as well as trying to infer species interactions (Berry and Widder 2014).
The network approach has advantages over the classic clustering method: 1) it solves the tiebreaker problem and, more importantly, 2) objectively identify and quantify the bioregions without relying on similarity metrics (e.g. Jaccard).
There are different methods to detect “communities” or modules (i.e. bioregions) in complex networks. Traditionally, biogeographical studies have used the Modularity Simulated Annealing (modularity SA) method (Guimera and Amaral 2005). However, more recently another community detection method, called Map Equation (ME) method, has been used to detect biogeographical regions (Vilhena and Antonelli 2015). They found it outperformed the clustering methods using a dataset of all American plants in identifying the generally recognized biomes of the United States. ME is a method based upon the ‘flow’ of information through the network (Rosvall and Bergstrom 2008), in contrast modularity SA method is a purely topological (and unsupervised) method that focuses on the structure of the network (Sales-Pardo et al. 2007).
Given these recent developments of methods to detect bioregions we ask in our paper: Are network methods better than the clustering method? And if yes, which network method is better? Or what are the different advantages between these methods? In our paper (Bloomfield et al. 2018) we evaluated the performance of the clustering method against the two different network community methods using two continental-scale data sets of two iconic Australian plant groups, eucalypts and Acacias.
Figure 2. Analysis of the Acacia (top panel) and eucalypts (bottom panel) dataset comparing the three different bioregionalization methods. (A, X) show the result of the S2 (βsim) clustering and dendrogram from which the bioregions were obtained. (B, Y) show the results of the map equation analysis and (C, Z) of the modularity analysis. The networks are colored according to the bioregions, and shown using the Force Atlas 2 layout.
We found that network methods outperformed the clustering method in different ways. For example, network methods provided more robust results and new insights about the relationship between bioregions and their boundaries. We also found differences between the network methods (Map Equation and Modularity SA method). Modularity SA consistently found for both data sets (Acacia and eucalypts) fewer and larger bioregions than the Map Equation method (Fig. 2).
An interesting finding of our analysis shows the importance of the network metrics participation coefficient to detect transition zones between bioregions. This has great importance for continental-scale data where you can identify and quantify those areas of high species turnover without needing to calculate dissimilarity matrices.