Biodiversity Data Journal :
Research Article
|
Corresponding author: Cristina Ronquillo (cristinaronquillo@mncn.csic.es)
Academic editor: Yasen Mutafchiev
Received: 21 Apr 2020 | Accepted: 05 Aug 2020 | Published: 15 Sep 2020
© 2020 Cristina Ronquillo, Fernanda Alves-Martins, Vicente Mazimpaka, Thadeu Sobral-Souza, Bruno Vilela-Silva, Nagore G. Medina, Joaquín Hortal
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ronquillo C, Alves-Martins F, Mazimpaka V, Sobral-Souza T, Vilela-Silva B, G. Medina N, Hortal J (2020) Assessing spatial and temporal biases and gaps in the publicly available distributional information of Iberian mosses. Biodiversity Data Journal 8: e53474. https://doi.org/10.3897/BDJ.8.e53474
|
One of the most valuable initiatives on massive availability of biodiversity data is the Global Biodiversity Information Facility, which is creating new opportunities to develop and test macroecological knowledge. However, the potential uses of these data are limited by the gaps and biases associated to large-scale distributional databases (the so-called Wallacean shortfall). Describing and quantifying these limitations are essential to improve knowledge on biodiversity, especially in poorly-studied groups, such as mosses. Here we assess the coverage of the publicly-available distributional information on Iberian mosses, defining its eventual biases and gaps. For this purpose, we compiled IberBryo v1.0, a database that comprises 82,582 records after processing and checking the geospatial and taxonomical information. Our results show the limitations of data and metadata of the publicly-available information. Particularly, ca. 42% of the records lacked collecting date information, which limits data usefulness for time coverage analyses and enlarges the existing knowledge gaps. Then we evaluated the overall coverage of several aspects of the spatial, temporal and environmental variability of the Iberian Peninsula. Through this assessment, we demonstrate that the publicly-available information on Iberian mosses presents significant biases. Inventory completeness is strongly conditioned by the recorders' survey bias, particularly in northern Portugal and eastern Spain and the spatial pattern of surveys is also biased towards mountains. Besides, the temporal pattern of survey effort intensifies from 1970 onwards, encompassing a progressive increase in the geographic coverage of the Iberian Peninsula. Although we just found 5% of well-surveyed cells of 30’ of resolution over the 1970-2018 period, they cover about a fifth of the main climatic gradients of the Iberian Peninsula, which provides a fair – though limited – coverage. Yet, the well-surveyed cells are biased towards anthropised areas and some of them are located in areas under intense land-use changes, mainly due to the wood-fires of the last decade. Despite the overall increase, we found a noticeable gap of information in the south-west of Iberia, the Ebro river basin and the inner plateaus. All these gaps and biases call for a careful use of the available distributional data of Iberian mosses for biogeographical and ecological modelling analysis. Further, our results highlight the necessity of incorporating several good practices to increase the coverage of high-quality information. These good practices include digitalisation of specimens and metadata information, improvement on the protocols to get accurate data and metadata or revisions of the vouchers and recorders' field notebooks. These procedures are essential to improve the quality and coverage of the data. Finally, we also encourage Iberian bryologists to establish a series of re-surveys of classical localities that would allow updating the information on the group, as well as to design their future surveys considering the most important information gaps on IberBryo.
Biodiversity data, Bryophyta, Global Biodiversity Information Facility, IberBryo, Iberian Peninsula, Inventory completeness, Wallacean Shortfall
The current massive availability of biodiversity data is creating new opportunities to develop and test macroecological knowledge (
Advances in big biodiversity data tools and computational power are continually increasing the potential offered by this information (
Once these issues are handled, the subsequent task would be to assess the quality of data as a whole. In the particular case of macroecology and biogeography, this means addressing the gaps and biases associated to large-scale databases (
The extent of the Wallacean shortfall varies considerably amongst taxonomic groups (
Here we aim to assess and quantify the knowledge on the publicly-available distributional information on Iberian mosses, defining its eventual biases and gaps. To do this, we compile an extensive Iberian moss database, process its records to filter those with adequate quality and then analyse their coverage. Specifically we aim to: (i) assess the overall quality of moss records in the Iberian Peninsula; (ii) evaluate their substrate, altitudinal, temporal and spatial coverage; (iii) analyse their inventory completeness; and (iv) assess the adequacy of well-surveyed areas to recover the responses of biodiversity to climatic and land-use changes.
We downloaded 97,597 records of mosses (keyword Phylum: Bryophyta) for the Iberian Peninsula – defined as mainland Portugal and Spain, plus the Balearic Islands, Andorra and Gibraltar – from GBIF (
Geospatial validation. We checked the coordinates of all records following their available geographic location through ‘point-in-polygon’ test at province/district level with
Taxonomic validation and standardisation. We checked all species names (extracted from GBIF fields “scientific_name” and “genus” + “species”) to remove fossil specimens, misidentifications, wrong country locations or insufficient taxon rank identification. Records were reviewed following the checklists in
Year validation. We excluded all the occurrences without collecting date information at year level in the IberBryo v1.1 database to perform the climatic and land-use coverage analyses (see below), although we kept them in the IberBryo v1.0.
Once all records had been pre-processed, we assessed the overall coverage of the spatial, temporal and environmental variability of the Iberian Peninsula provided by the inventories contained in IberBryo. All analyses were performed in R (
Substrate coverage. Due to the absence of habitat-type information in most of the records, we were only able to assess the coverage of ecological substrates by checking in specialised references all the taxa that thrive in each type of substrate. First, we made a simplified reclassification based on BRYOATT (
Altitudinal coverage. We applied a Kolmogorov-Smirnov test to assess whether the altitudinal range, covered by moss occurrences, represented the altitudinal patterns of the study area. We attributed altitudinal data to each occurrence using a digital elevation model (DEM) of the study area at a spatial resolution of 30 arc-seconds, extracted from GMTED2010 (
Temporal coverage. We represented the historical accumulation of new species (excluding infraspecific taxa) recorded in IberBryo and the number of records gathered by calendar years. Then we evaluated the relationship between number of records and newly-observed species per year through Spearman correlations. We defined different periods of data collection for the following analyses, based on the information provided by the curve and the main historical periods happening in the Iberian countries.
Spatial coverage and survey completeness. We calculated basic metrics of spatial coverage (number of records, observed richness and completeness) for all Iberian grid cells at two different resolutions, 5’ (~65 km2) and 30’ (~2500 km2), using the R package ‘KnowBR’ v 2.0 (
We also obtained the location of the main bryology centres of Spain and Portugal. This selection was based on the more frequent affiliation centres collected on SCOPUS publications with the keywords “Bryophyte”, “moss”, “musgo” or “briofito”. We also extracted the location of recently-published PhD theses on bryophytes from
Climatic coverage. We assessed the coverage of the climatic variability of the Iberian Peninsula provided by the set of well-surveyed grid cells. To do this, we characterised the climatic environmental space of the Iberian Peninsula, based on the 19 bioclimatic variables from WorldClim 2.0 (
Land-use change coverage. We assessed the adequacy of moss data for representing changes in moss assemblages driven by recent land-use modifications in the Iberian Peninsula, following the method used for climatic coverage. We characterised recent land-use variations using information from Corine Land Cover Changes (Corine Land Cover seamless vector database- CLC v. 20;
Version 1.0 of IberBryo database (Suppl. material
The taxonomic validation led to the deletion of 1,717 occurrences because of taxonomic issues (Fig.
The historical pattern of moss surveys shows a steady increase in number of records and new species gathered through time. Due to the evaluation based on IberBryo v1.1 (only records with collecting date), the observed number of species (excluding infraspecific taxa) acummulated until 2018 was reduced to 745. The highest survey rates take place after 2000, and the accumulated number of observed species increased especially in the period 1970-1999 (Fig.
Geographic distribution of inventory completeness in the 1970-2018 period at 30’ resolution, according to the IberBryo v1.1 database. Values close to red represent higher percentages of completeness. Black squares correspond to well-surveyed cells (completeness ≥ 80% and number of records ≥ 10), white X-crosses to PhD theses – from left to right: Helena Hespanhol (NW Portugal), Katia Cezón (Castilla-La Mancha) and Susana Rams (Sierra Nevada) and black asterisks to major Iberian bryologist groups. These main research centres on bryophytes correspond to: Universidad Autónoma de Barcelona, Universidad Autónoma de Madrid, Universidad Complutense de Madrid, Universidade de Lisboa, Universidad de Murcia, Universidad Rey Juan Carlos, Universidade de Santiago de Compostela, Universitat de València, Museo Nacional de Ciencias Naturales (MNCN-CSIC) and Real Jardín Botánico (RJB-CSIC).
The higher numbers of moss records, observed species richness and inventory completeness are mainly located in mountainous areas of the north and eastern Spanish coasts between 1970 and 1999 and in northern Portugal, central Spain and the mountainous area of Sierra Nevada between 2000 and 2018 (Fig.
Geographical coverage of moss surveys as number of records, observed richness and inventory completeness included in IberBryo v1.1 database (with information on collecting date at year level; 1783-2018) and in IberBryo v1.0 database (including records without information on collecting date) at 30’ resolution.
The PCA identified the two main gradients that characterise the climate of the Iberian Peninsula: one axis mainly related to seasonality — that separates the Mediterranean from Atlantic zones; and another axis related to temperature and (to a less extent) precipitation variations — that describes a gradient from cold (northern-mountainous) to warm-dry zones (central-south-eastern Iberia) (Suppl. material
Climatic coverage of Iberian moss surveys. (A) Frequency of climate types in the Iberian environmental space (values indicate the number of 30’ cells of each climate type). (B) Frequency of climate types covered by well-surveyed cells (values indicate the number of 30’ cells of each climate type). (C) Geographic distribution of climatic rarity index in the study area (rarest climate types = 1), red squares indicate the location of well-surveyed moss cells. (D) Density comparison of the climatic rarity covered by Iberian cells (black line) and well-surveyed moss cells (green line).
(A) Geographical distribution of frequency in land-use changes in 1990-2018 at 30’ resolution cells. (B) Proportion of land-use changed area in 1990-2018 at 30’ resolution cells. (C) Geographical distribution of ‘anthropised change ratio’ as artificial surfaces [A] or natural surfaces [N] changes. Dark brown cells ‘Anthropised only’ N to A; Light brown cells ‘Mostly anthropised’ N to A > A to N; Grey cells ‘Equally changed’ N to A = A to N; Green cells ‘Naturalised’ N to A < A to N. Red squares indicate the location of well-surveyed moss cells.
Our analysis of the publicly-available data on Iberian mosses evidences the large extent of the shortfalls of the distributional information for this group. Besides, our study proves the crucial importance of data (and metadata) quality for evaluating the Wallacean shortfall for mosses, in the same way as has been established previously for other groups (
The different biases, identified in moss biodiversity information, could compromise the reliability of eventual macroecological analysis carried out with the publicly-available data. Indeed, the main geographical pattern of observed species richness of Iberian mosses can be easily attributed to the recorders' home range (sensu
It is remarkable how much the absence of basic information aggravates the general limitations of our database. This evidences the necessity of gathering good quality data, as well as documenting metadata information properly. By an in-depth process of record verification and data-cleaning, we were able to improve the first versions of IberBryo, increasing the amount of data useful for the analysis. Despite these improvements, we found an important problem in the records' metadata. The absence of information on the collecting dates, that affected ca. 42% of the occurrences and prevented us from detecting duplicate records, limited significantly our assessment of inventory completeness (see
Publicly-available Iberian moss records presented other common problems of biodiversity data related to georeferencing (
The spatial coverage of Iberian moss surveys through time shows two distinct periods. On the one hand, records follow a patchy distributed pattern until 1970. The surveys showed a remarkable stop in the acquisition of new records between 1935 and 1969 – a setback attributable to the Spanish Civil War and the dictatorships suffered during this period in Spain and Portugal that has been previously described in other groups of organisms (
Interestingly, our findings on spatial coverage at two different cell resolutions allowed us to show that local surveys of mosses are not reflected at regional scale, so well-surveyed areas coincide only partially amongst resolutions (see Suppl. materials
Despite all the gaps and biases identified by our study, we find that Iberian climatic gradients — including the rarest climates — are fairly represented by the limited number of well-surveyed 30' cells, which just represent 5% of Iberia. That said, it is clear that it is highly desirable to enlarge the climatic coverage to improve the reliability of any species distribution model or similar approaches that are conducted with these data to assess the effects of climate change, invasions or other aspects of global change (
We show that the publicly-available information on Iberian mosses presents significant biases, related to the Wallacean shortfall, but also to basic knowledge on their ecology. This calls for a careful use of this information for biogeographical, ecological modelling and macroecological analysis. It could be argued that the over-representation of certain areas or environments caused by the spatial biases in the data is a relatively minor problem, if overall coverage of climatic and land-use gradients were good. However, opposite to the most intensely-sampled areas, we find noticeable spatial gaps in the information, particularly in the south-west of Iberia and the inner plateaus. The lack of information from these regions compromises any assessment of the processes behind species diversity patterns, as well as the implementation of conservation biogeography approaches (
We thank Priscila Lemes and Joaquín Calatayud for their constructive comments on the development of methods. CR was funded by the Comunidad de Madrid and the European Social Fund co-financed through the Youth Employment Operational Program and the Youth Employment Initiative (YEI) grant PEJ-2017-AI/AMB/6655. This work is part of the project UNITED Unifying niches, interactions and distributions: A common theoretical framework for geographic range dynamics and local coexistence (CGL2016-78070-P, funded by AEI/FEDER, UE).
CR and JH designed research, with FAM and NGM. CR gathered and processed all data, with VM and NGM. TS-S and BV provided novel R scripts. CR and FAM analysed the data, with aid from TS-S, BV, VM, NGM and JH. CR wrote the paper, with NGM and JH. All authors discussed results and approved the last version of the manuscript.
IberBryo database (.txt format; UTF-8 encoding)
Also available in: Ronquillo, Cristina; Hortal, Joaquín; 2020; "IberBryo - iberian mosses occurrences dataset"; DIGITAL-CSIC; Version 1.0; http://dx.doi.org/10.20350/digitalCSIC/12494 (This excel version includes fields' descriptions).
(A) IberBryo v1.1 occurrences (47,730), (B) Preprocessed occurrences without collecting date (34,852) (C) Occurrences from GBIF before data-cleaning and validation process (33,382).
Frequency of used substrate [1] Rare substrate [2] Occasional substrate [3] Normal substrate.
(A) Distribution of Worldclim 2.0 biovariables at 30’ resolution along the space described by the two climatic axes identified by a PCA. (B) Distribution of Schoener’s D of climatic variability in our study area (grey bars). The dashed red line indicates the Schoener’s D overlap value of well-sampled mosses sites. (C) Geographical distribution of PCA axes scores in the Iberian Peninsula. Colour gradients represent the values of each cell in the corresponding axis, ranging from the most negative (white) to the most positive (green) scores (see the corresponding scale bars). (D) Comparison between the density of PCA scores of the Iberian Peninsula (black line) and the well-surveyed bryophyte cells (red line) for each PCA axis.
Reclassification 1 corresponds to aggregated classes of CORINE according to the importance of bryophyte natural history. Reclassification 2 corresponds to whether each type of land-use is (arguably) of artificial or natural origin.
Detailed process of IberBryo creation
The folder contains 3 R scripts used in this work.: 'Climatic coverage analysis' , 'Land use coverage analysis' and 'Temporal and Spatial coverage analysis'