Discovering biogeographic and ecological clusters with a graph theoretic spin on factor analysis

13 April 2019

Alroy, John

Factor analysis (FA) has the advantage of highlighting each semi-distinct cluster of samples in a data set with one axis at a time, as opposed to simply arranging samples across axes to represent gradients. However, in the case of presence-absence data it is confounded by absences when gradients are long. No statistical model can cope with this problem because the raw data simply do not present underlying information about the length of such gradients. Here I propose a simple way to tease out this information. It is a simple emendation of FA called stepping down, which involves giving an absence a negative value when the missing species nowhere co-occurs with the species found in the relevant sample. Specifically, a binary co-occurrence graph is created, and the magnitude of negative values is made a function of how far the graph must be traversed in order to link the missing species with each species that is present. Simulations show that standard FA yields inferior results to FA based on stepped-down matrices in terms of mapping clusters into axes one-by-one. Standard FA is also uninformative when applied to a global bat inventory data set. Step-down FA (SDFA) easily flags the main biogeographic groupings. Methods like correspondence analysis, non-metric multidimensional scaling, and Bayesian latent variable modelling are not commensurate with SDFA because they do not seek to find a one-to-one mapping of axes and clusters. Stepping down seems promising as a means of illustrating clusters of samples, especially when there are subtle or complex discontinuities in gradients.