Accessibility maps as a tool to predict sampling bias in historical biodiversity records

Submitted by editor on 3 September 2018. Get the paper!
The King’s Map, a unique map of South Africa produced in 1790 for the King Louis XVI of France, on the observations of the French ornithologist and explorer Francois Levaillant, by M. de Laborde. By publishing journals about his voyages, Levaillant provided us with invaluable descriptions of South African ecosystems in the early historical period. (Public domain - Levaillant 1790).

Not all datasets are created equal, but some methods are here to help.

Accessibility maps as a tool to predict sampling bias in historical biodiversity records

By Sophie Monsarrat

Can we afford to discard valuable ecological data when the ideal of systematic data collection is not achieved? Opportunistically collected biodiversity data contain strong sampling biases but they can also help us gain a deeper understanding of biodiversity patterns and may support, among other things, conservation initiatives. It is thus critical we develop tools to characterize and address the limitations in these data.


Sampling bias in historical biodiversity datasets

The collection process for biodiversity records is often spatially biased towards regions more frequented by observers. This results in observed distribution patterns that are a reflection of the intensity of sampling rather than of the actual distribution of species. As one goes back in time and starts considering historical biodiversity data collected over the last centuries, this issue of sampling bias quickly becomes the norm.


Collected before any scientific protocol existed, through opportunistic and unstandardized sampling and reporting, historical biodiversity records are a statistician’s nightmare. Existing methodologies are not adapted to address sampling biases for single-species datasets that have a small sample size, which is a common feature of historical biodiversity datasets. For that reason, historical data are often perceived as untrustworthy and discarded in quantitative analyses. Because they provide unique insights into biodiversity trends over a long period of time and are key to detecting and quantifying long-term human impacts on biodiversity, we need to find ways to integrate long-term biodiversity records in modern spatial analyses.


Accessibility maps as a tool to predict sampling bias

In our paper recently published in Ecography, we propose that accessibility maps be used as a tool to predict sampling bias in historical written occurrence records, with the aim to improve the use of these data in spatial analyses. We present a method for creating accessibility maps that does not require empirical data on the location of observers and we test how well it predicts sampling bias in a dataset of large mammal occurrence collected in the early post-colonial period (16th to 19th century) in South Africa.


The historical literature – journals, diaries written by European settlers, missionaries, naturalists and explorers – contains countless pieces of information on the lifestyle and habits of early travellers. We took advantage of these to understand the factors constraining the movement of observers. Based on this information, we mapped the accessibility of the landscape as the combination of two spatial components: the proximity of European settlements and proximity to freshwater.

“During the whole day we had nothing but a dry and burning desert to traverse. After dinner, two of my oxen, exhausted by thirst and fatigue, dropped down, and I was under the necessity of leaving them behind” (Le Vaillant, 1796, p.213)​

What drove the accessibility of the South African landscape for early travellers and explorers? Our (short) answer: freshwater and European settlements. View of the Karoo, a semidesert region of South Africa. Photo: S Monsarrat.


We found that accessibility maps based on simple statistical rules and only two spatial features could predict the geographical and environmental bias found in the South African dataset. These results suggest that sampling effort can be modelled accurately without the use of empirical data, given that we know the processes influencing the bias behind data collection.

Accessibility map built from a model based on two spatial features: proximity of freshwater and proximity of European settlements. Shades of red indicate progressively higher accessibility. Black dots correspond to historical written records of large mammal occurrence for the period 1497-1920. There is a strong linear correlation between the frequency of observed occurrences and predicted accessibility index (Pearson’s correlation coefficient, ρ=0.93).


Implications for spatial analyses and species distribution modelling

We suggest that, rather than discarding historical occurrence datasets a priori due to the possible biases they may contain, accessibility maps could be used to explore sampling bias and improve the use of these data in modern quantitative analyses. More specifically, they could be used to manipulate background data in species distribution modelling to generate pseudo-absences data with a similar geographical sampling bias to that of the presence data. They could also serve to adjust model estimates by down-weighting sample points from locations with higher accessibility or to build a biased prior for the distribution of sampling effort, to be used as a bias file in the widely used species distribution modelling method MaxEnt.


Future directions

While this study focused on historical written records of occurrence, a parallel can be made with contemporary datasets that are subject to the same type of biases. For this reason, we encourage further testing of this approach in different spatio-temporal contexts. If accessibility maps prove to be robust predictors of sampling bias in different contexts and are shown to improve the performance of species distribution models, this will provide strong support for their relevance in addressing sampling bias in the analyses of small datasets of occurrence.


The code and data from our paper are available from Figshare Digital Repository (



Levaillant, F., 1790. Partie Méridionale de l'Afrique depuis le Tropique du Capricorne jusqu'au Cap de Bonne Espérance contenant les Pays des Hottentots, des Cafres et de quelques autres Nations / dressée pour le Roi sur les observations de M. Le Vaillant par M. de Laborde, ancien premier valet de chambre du Roi, gouverneur du Louvre, l'un des Fermiers généraux de Sa Majesté. Paris:  Bibliothèque nationale de France.


Levaillant, F., 1796. New travels into the interior parts of Africa : by the way of the Cape of Good Hope, in the years 1783, 84 and 85. London: Printed for G.G. and J. Robinson.


Monsarrat, S., Boshoff, A.F., Kerley, G.I.H., 2018. Accessibility maps as a tool to predict sampling bias in historical biodiversity occurrence records. Ecography 41:1-12. doi: [10.1111/ecog.03944].


Twitter: @MonsarratS