|
Biodiversity Data Journal :
Methods
|
|
Corresponding author: Petr Keil (keil@fzp.czu.cz)
Academic editor: Vishwas Chavan
Received: 28 Jun 2023 | Accepted: 09 Nov 2023 | Published: 23 Nov 2023
© 2023 Kateřina Tschernosterová, Eva Trávníčková, Florencia Grattarola, Clara Rosse, Petr Keil
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Tschernosterová K, Trávníčková E, Grattarola F, Rosse C, Keil P (2023) SPARSE 1.0: a template for databases of species inventories, with an open example of Czech birds. Biodiversity Data Journal 11: e108731. https://doi.org/10.3897/BDJ.11.e108731
|
|
Here, we introduce SPARSE (acronym for "SPecies AcRoss ScalEs"), a simple and portable template for databases that can store data on species composition derived from ecological inventories, surveys and checklists, with emphasis on metadata describing sampling effort and methods. SPARSE can accommodate resurveys and time series and data from different spatial scales, as well as complex sampling designs. SPARSE focuses on inventories that report multiple species for a given site, together with sampling methods and effort, which can be used in statistical models of true probability of occurrence of species. SPARSE is spatially explicit and can accommodate nested spatial structures from multiple spatial scales, including sampling designs where multiple sites within a larger area have been surveyed and the larger area can again be nested in an even larger region. Each site in SPARSE is represented either by a point, line (for transects) or polygon, stored in an ESRI shapefile. SPARSE implements a new combination of our own field definitions with Darwin Core biodiversity data standard and its Humboldt core extension. The use of Humboldt core also makes SPARSE suitable for biodiversity data with temporal replication.
We provide an example use of the SPARSE framework by digitising data on birds from the Czech Republic, from 348 sites and 524 sampling events, with 15,969 unique species-per-event observations of presence, abundance or population density. To facilitate use without the need for a high-level database expertise, the Czech bird example is implemented as MS Access .accdb file, but can be ported to other database engines. The example of Czech birds complements other bird datasets from the Czech Republic, specifically the four gridded national atlases and the breeding bird survey which cover a similar temporal extent, but different locations and spatial scales.
aves, biodiversity informatics, open data, nature reserve, re-survey, sample area, time series, checklist, survey
SPARSE database is designed to store species inventory data, with special emphasis on documenting sampling effort and methods and can accommodate considerable variation in spatial configuration of inventories, as well as multiple repeated inventories done at the same site at different times (a.k.a. time series). The database has simple structure, so that others can use it or copy it, without a detailed knowledge of advanced database environments. This is also the reason for implementing it in MS Access, which is widespread and user-friendly, albeit commercial, software. Each site in the Access database is represented by a point, line or polygon in ESRI shapefiles that are provided separately from the Access file; these two data types (Access tables and ESRI shapefiles) are linked using the objectID identifier unique to each site.
When designing SPARSE, we used a combination of Darwin core (DWC,
SPARSE is designed to be modular and customisable, but we also needed it to maintain integrity and quality of the data. For this purpose, the current version of SPARSE comes with a set of controlled vocabularies in 15 codebooks. However, their use in any future derivatives of SPARSE is completely optional and they can easily be removed or modified.
SPARSE structure should work for groups of organisms other than birds and should be applicable to different regions of the world, as well as for various sampling methods and measurements. The structure can be readily converted to other database engines, such as PostgreSQL.
Thanks to initiatives such as GBIF, eBird or iNaturalist, the volume of biodiversity data has been growing (https://www.gbif.org/analytics/global), particularly the volume of presence-only incidental observations. However, these data have important drawbacks: (1) Although GBIF and eBird have the option to record where a species was not recorded, this is still not a common practice. At the time of writing of this manuscript, absences made ca. 1% of GBIF records (there were 27 million absences vs. 2,582 million presences). This lack of absences limits the use of the data in probabilistic species distribution models (
Unlike presence-only point records, inventories can potentially be used in statistical models assessing probability of occurrence (
The challenge is how to mobilise and store such heterogeneous data as inventories (
To provide an example of how SPARSE can work, we used it to store species inventory data on Czech birds. These mostly consist of inventories of natural reserves and faunistic surveys of various patches of habitats (e.g. for a specific forest) published in local journals or as white papers (Fig.
Examples of valuable raw contents of Czech bird inventories that were published during the 20th century in local journals and white papers. a Some published inventories come with detailed maps of surveyed areas, sometimes with complex sampling designs, such as the points located along a transect within a polygon (
All of this is readily available and can be downloaded and edited by users as they please. Thus, SPARSE is not a centralised database where contributors upload their data. Instead, its purpose is to serve as a simple template that is stored on users' personal computers and which users can modify for their own projects, for example, by adding new fields or vocabularies.
By focusing on Czech birds, we hope to provide an additional source of data that complements other existing bird databases. Czech Republic has some of the best high-quality presence-absence and abundance bird data in the world. This includes four periods of gridded atlas data at resolutions of ca. 10 x 10 km2 (
SPARSE consists of the main MS Access file (SPARSE.accdb), an MS Excel spreadsheet with detailed field definitions (SPARSE_definitions.xlsx), an .xlsx template for data input (INPUT_data-template.xlsx), a BibTeX file with complete bibliography of all studies in the database (SPARSE_bibliography.bib), a SPARSE_shapefile folder that lists all the points, lines and polygons corresponding to sites and a code for processing and plotting the data.
MS Access file. The main body of SPARSE in the SPARSE.accdb file is implemented in four tables (Fig.
Each table comes with several codebooks (CB) which list predefined controlled vocabulary for selected fields. These can be re-defined, or removed, by users according to their specific needs.
Field definitions. Detailed description of each field in these four tables is provided in the DEFINITIONS table in the SPARSE_definitions.xlsx file. The most important columns in the table are:
Shapefiles. Each site in the SITE table is associated with a point, line or polygon geometry (through a combination of siteShapeID and objectID) that are stored in three shapefiles in the SPARSE_shapefiles folder.
Bibliographic information. Each dataset in the DATASET table comes with a detailed bibliographic reference stored in a SPARSE_bibliography.bib BibTeX file. This is to facilitate citations of the original publications from which we extracted the data. See the Re-use potential and licensing section below for details.
We provide an example use of the SPARSE framework by digitising data on birds from the Czech Republic, from 348 sites and 524 sampling events, with 15,969 unique species-per-event observations of presence, abundance or population density.
The data extraction and input procedure, step-by-step, was as follows:
These are the references from which we have extracted data to SPARSE 1.0:
Together, these data have the following taxonomic, temporal and geographic coverage:
Taxonomic coverage: The dataset covers birds (Vertebrata, Aves), using the IOC World Bird List (v. 11.1) taxonomy of
Temporal coverage: The dataset covers the period between 1890 and 2020. Some inventories use very old data, mostly haphazard/non-systematic observations, sometimes from local people. The oldest start year of "event" survey is 1890 (e.g.
(a) The temporal extent of the dataset; the histogram summarises starting sampling years of all events (log10 y axis). Blue rectangles represent the four temporal periods covered by the Czech Breeding Bird Atlases (
Geographic coverage: The bird dataset covers the area of the Czech Republic, Europe (Fig.
(a) Map of sites at which inventories were done. These represent a mix of point, line and polygon objects; (b) Histogram of areas the sites (log10 x axis); (c) Histogram of number of species detected at each site (log10 x axis); (d) Relationship between area of each site and number of species detected at the site (log10 x and y axes), with a linear regression fitted through the data. Grey band indicates standard errors.
A dynamic GitHub repository of SPARSE, which may undergo development and updates in the future, is at https://github.com/petrkeil/SPARSE1.
A static fixed version of the database in a .zip file, which contains all the files as on the date of submission (25 May 2023), is provided as Suppl. material
Applicability to other taxa. We envision that SPARSE can be re-used for taxa other than birds, as well as for other geographic areas. The main properties that facilitate this cross-taxon use are: (1) The flexible nature of the measurementUnitID field (in the MEASUREMENT table), which can accommodate presences/absences, abundances, but also densities, estimates of percentage cover (e.g. in botanical plots), biomass etc.; (2) flexible taxonomic metadata (in the MEASUREMENT and EVENT tables); (3) flexible metadata on methods (in the EVENT table), that can cover a range of methods, from ornithological point counts to entomological sweeping net surveys or botanical relevees.
Interoperability and openness. SPARSE currently uses .xlsx and .accdb file formats, which may reduce interoperability when compared to, for example, .csv files, but they can still be opened using free and open software. Specifically, the .xlsx file format is an Office Open XML (developed by Microsoft), it is fully open (see here https://en.wikipedia.org/wiki/Office_Open_XML) and users can open any .xlsx file in common free and open programmes, such as Libre Office Calc. The .accdb file can be opened using free and open tools, such as MDB Tools, Jakcess or Libre Office Base. As a result of this, we consider the data as de facto open. However, particularly the decision to use the .accdb file was not taken lightly; the initial versions of SPARSE were implemented in Libre Office Base. The decision to migrate to .accdb was made because, at the time of implementation, MS Access was more user-friendly, the authors were familiar with it and we had access to specific training that was unavailable for Libre Office Base. In addition, SPARSE also uses ESRI shapefiles. Similarly to .xlsx, this format has been developed by a commercial company (ESRI), but the format itself is fully open and can be imported to free and open software, such as R or QGIS.
Licensing and attribution. We have put the SPARSE database structure under the Creative Commons Attribution (CC-BY) license 4.0 (https://creativecommons.org/licenses/by/4.0). Users of the SPARSE framework, or those who modify it, should cite this publication. Users of the Czech bird data that accompany this publication should cite the original publications (studies) in which the inventories were first published. If they take the data from SPARSE (as opposed to directly extracting it from the original publications), we would like to ask them to also cite this publication. A BibTeX file with all the references accompanies the raw data and can be loaded to common reference managers, such as Zotero or JabRef.
Our work on SPARSE was preceded by a similar effort during summer 2020, at iDiv, Leipzig, by Petr Keil & Clara Rosse, with the backing of Jonathan M. Chase; neither the data nor the specific database structure made it here, but the ideas and experiences were invaluable for SPARSE. We are grateful to David Stroch for provision of the raw data from Storch & Kotecký (1999). We took inspirations and ideas from discussions with Karel Chobot (AOPK ČR), Zdeněk Kučera (AOPK), Petr Kovařík (UPOL) and Petr Balej (FZP CZU).
This research was funded by the European Union (ERC, BEAST, 101044740). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Czech University of Life Sciences Prague
ET, KT, FG, CR and PK participated on the development of the core structure of the database. ET found, mobilised and input the inventories to the database. KT implemented the database in MS Access and helped with data import. FG provided expertise and ideas on biodiversity standards. KT and ET standardised the taxonomy. PK conceived the idea and supervised the project. PK led the writing, with input from all co-authors. Overall, ET and KT contributed equally to this study and their order of authorship was decided by a coin flip.
This .zip archive contains the SPARSE 1.0 database as on the date of submission of the manuscript (22 June 2023).