How should spatial or phylogenetic eigenvectors be selected? A ten-year review and computer simulation study
Submitted by editor on 22 January 2018. Get the paper!By David Bauman
In what context was our study undertaken? Eigenvector mapping techniques are widely used by ecologists and evolutionary biologists to describe and control for spatial and/or phylogenetic patterns in their data. The selection of an appropriate subset of eigenvectors is a critical step (misspecification can lead to highly biased results and interpretations), and there is no consensus yet on how to proceed.
The contribution that we proposed was the following: We conducted a ten-year review of the practices of eigenvector selection and highlighted three main procedures: selecting the subset of descriptors minimising the Akaike information criterion (AIC), using a forward selection with double stopping criterion after testing the global model significance (FWD), and selecting the subset minimising the autocorrelation in the model residuals (MIR).
Schematic illustration of the three selection procedures for a univariate response vector y. Q: quadrat; V: spatial, temporal or phylogenetic eigenvector; sp: species; RDA: redundancy analysis. See Fig. 1 of the article for details.
We compared the type I error rates, statistical power, and R² estimation accuracy of these methods using simulated data. Finally, we analysed a real data (tree community of a miombo dry woodland forest in the upper Katanga, DRC) using variation partitioning to illustrate to what extent the different selection approaches affected the ecological interpretation of the results.
Tropical dry woodland, also known as miombo woodland, in the natural reserve of Mikembo (Upper Katanga, DRC; see Muledi et al. 2017 – Journal of Plant Ecology for details). Picture taken at the end of the dry season.
Tropical dry woodland, also known as miombo woodland, in the natural reserve of Mikembo. Picture taken at the end of the dry season.
What did we show? While the FWD and MIR approaches presented a correct type I error rate and were accurate, the AIC approach displayed extreme type I error rates (100%), and strongly overestimated the R². Moreover, the AIC approach resulted in wrong ecological interpretations, as it overestimated the pure spatial fraction (and the joint spatial-environmental fraction to a lesser extent) of the variation partitioning. Both the FWD and MIR methods performed well at broad and medium scales but had a very low power to detect fine-scale patterns. The FWD approach selected more eigenvectors than the MIR approach but also returned more accurate R² estimates.
We therefore conclude that the AIC approach should be abandoned, and advocate choosing between the MIR and FWD approaches depending on the objective of the study: controlling for spatial or phylogenetic autocorrelation (MIR) or describing the patterns as accurately as possible (FWD).