Biodiversity Data Journal :
Taxonomic paper
|
Streamlining the use of BOLD specimen data to record species distributions: a case study with ten Nearctic species of Microgastrinae (Hymenoptera: Braconidae)
Corresponding author:
Academic editor: Dominique Zimmermann
Received: 11 Oct 2014 | Accepted: 24 Oct 2014 | Published: 29 Oct 2014
© 2014 Jose Fernandez-Triana, Lyubomir Penev, Sujeevan Ratnasingham, M. Alex Smith, Jayme Sones, Angela Telfer, Jeremy deWaard, Paul Hebert
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Fernandez-Triana J, Penev L, Ratnasingham S, Smith M, Sones J, Telfer A, deWaard J, Hebert P (2014) Streamlining the use of BOLD specimen data to record species distributions: a case study with ten Nearctic species of Microgastrinae (Hymenoptera: Braconidae). Biodiversity Data Journal 2: e4153. https://doi.org/10.3897/BDJ.2.e4153
|
The Barcode of Life Data Systems (BOLD) is designed to support the generation and application of DNA barcode data, but it also provides a unique source of data with potential for many research uses. This paper explores the streamlining of BOLD specimen data to record species distributions – and its fast publication using the Biodiversity Data Journal (BDJ), and its authoring platform, the Pensoft Writing Tool (PWT). We selected a sample of 630 specimens and 10 species of a highly diverse group of parasitoid wasps (Hymenoptera: Braconidae, Microgastrinae) from the Nearctic region and used the information in BOLD to uncover a significant number of new records (of locality, provinces, territories and states). By converting specimen information (such as locality, collection date, collector, voucher depository) from the BOLD platform to the Excel template provided by the PWT, it is possible to quickly upload and generate long lists of "Material Examined" for papers discussing taxonomy, ecology and/or new distribution records of species. For the vast majority of publications including DNA barcodes, the generation and publication of ancillary data associated with the barcoded material is seldom highlighted and often disregarded, and the analysis of those data sets to uncover new distribution patterns of species has rarely been explored, even though many BOLD records represent new and/or significant discoveries. The introduction of journals specializing in – and streamlining – the release of these datasets, such as the BDJ, should facilitate thorough analysis of these records, as shown in this paper.
Species distribution records, streamlining data, Barcode of Life Data Systems, Pensoft Writing Tool, Microgastrinae, Nearctic
The Barcode of Life Data Systems (BOLD) is designed to support the generation and application of DNA barcode data. The platform consists of four main modules: a data portal, a database of barcode clusters, an educational portal, and a data collection workbench. Currently almost 4 million sequences (over 3.4 million of them DNA barcodes) are stored in BOLD, including coverage for more than 143K animal species, 53K plant species, and 16K fungi and other species (http://www.boldsystems.org/, accessed on October 09, 2014). This immense platform provides a unique source of data with potential for many research uses. For example, the use of sequences in BOLD has been greatly expanded in the past few years, and dozens of publications using BOLD exist.
A less known and explored avenue is the use of specimen data (also stored in BOLD), which has a great potential to reveal new biodiversity information as well. This paper explores the streamlining of BOLD specimen data to record species distributions. For that we use the Biodiversity Data Journal (BDJ), and its authoring platform, the Pensoft Writing Tool (PWT), which is a community peer-reviewed, open-access, comprehensive online platform, designed to accelerate publishing, dissemination and sharing of biodiversity-related data of any kind (http://biodiversitydatajournal.com/about#FocusandScope).
A highly diverse group of insects (parasitoid wasps of the family Braconidae, subfamily Microgastrinae) was used as a case study. Microgastrine wasps are among the best represented groups of Hymenoptera in BOLD, and over 20K sequences have been already released and are publicly available (e.g.,
All specimens used for this study are deposited in the collections of the Biodiversity Institute of Ontario, University of Guelph (BIO), or the Canadian National Collection of Insects, Ottawa (CNC). They represent dozens of collecting events and different research projects carried out by those institutions. Most of the specimens rendered partial or full DNA barcodes (information can be retrieved using the following public DOI: https://doi.org/10.5883/DS-NEAMICRO), and many have associated sequences, images and/or data from their labels in collections (Suppl. materials
We selected a sample of 630 specimens of parasitoid wasps, representing ten species and five genera of Microgastrinae (Hymenoptera: Braconidae) from the Nearctic region (Canada and the United States). The identity of the species was confirmed by JFT against authenticated material deposited in the CNC.
We downloaded the “Data Spreadsheet” associated with those records, including the worksheets “Collection Data” and “Specimens Details” (for a detailed explanation consult the “BOLD Print Handbook for BOLD v3”, freely available online at http://www.boldsystems.org/libhtml_v3/static/BOLD_Handbook_Oct2013.pdf).
The Excel file downloaded from BOLD (Suppl. material
We compared the known distributions of the species (using mostly
The ten species analyzed here represented 13 Barcode Index Numbers (BINS), sensu
A few specimens discussed in this paper did not render any DNA and thus do not have any associated sequence in BOLD. However, their ancillary data (e.g., collecting date, locality, etc.) was still available and thus is presented below.
When discussing the distribution of species, states of the United States and provinces/territories of Canada are abbreviated with acronyms consisting of two capital letters, following Canada Post standards (http://www.canadapost.ca/tools/pg/manual/PGaddress-e.asp).
A cosmopolitan species (
This species is widely distributed in central and eastern North America. It was recorded from two provinces in Canada (NS, ON) by
The specimens of A. conanchetorum that rendered DNA barcodes comprise two BINS: BOLD:AAC5506 (eastern North America) and BOLD:AAC5507 (principally Western Canada, but some records from ON, PE) (Suppl. material
This species is widely distributed in central and eastern North America. It was recorded from four provinces in Canada (MB, NS, ON, QC) by
The specimens of A. ensiger that rendered DNA barcodes comprise two BINS, BOLD:ACE6783 (ON, MB) and BOLD:AAA3764 (AB, ON, SK and some localities of southern US) (Suppl. material
This species is widely distributed in the Holarctic and it was recorded from three provinces in Canada (BC, NB, NL) by
The specimens of A. sodalis that rendered DNA barcodes comprise two BINS: BOLD:AAM7223 (BC, NL) and BOLD:AAN1859 (BC) (Suppl. material
Previously, this species was known to be widely distributed in the Palaearctic, with one record from tropical Africa (
This species has been recorded mainly from northeast and central United States (
This species is widely distributed in the Palaearctic (
This species is relatively widely distributed in northeast North America, extending up to central Canada (
This species is widely distributed within the Nearctic (e.g.,
This species is widely distributed in the Holarctic (e.g.,
In addition to being a diverse platform for the generation, analysis and permanent storage of DNA barcodes, the Barcode of Life Data Systems (BOLD) houses an immense amount of collateral data that are only starting to be explored and exploited. Validated species occurrences records are one such data element, critical for various fields and applications from environmental niche modeling in ecology to risk assessments in conservation biology. This paper demonstrates the potential of using BOLD occurrence data for these purposes – to define and refine the distributions of species – and presents a model for their rapid dissemination in the primary literature.
The key to this model is the seamless and dynamic transition of data between source and journal. By converting specimen information (such as locality, collection date, collector, voucher depository) from the BOLD platform to the Excel template provided by the Pensoft Writing Tool (PWT), the authoring platform of the Biodiversity Data Journal (BDJ), it is possible to quickly upload and generate long lists of "Material Examined" for papers discussing taxonomy, ecology and/or new distribution records of species. Additional functions in BOLD allow verification of this material, for instance the ability to generate high-res image libraries (e.g., Suppl. material
The logical next step would be to create a simple interface and module at the source database (BOLD) that will submit data selected by the author, "at the click of a button", into the PWT, using the Application Programming Interface (API) of the latter. Within the PWT environment, the submitting author can complete the manuscript by adding additional text, figures, images, citations, references and so on. The author can also invite co-authors and/or "contributors" (e.g., linguistic editors, mentors, colleagues who are not formal co-authors) to collaboratively work on the manuscript. After completion, the manuscript is submitted to BDJ, again "at the click of a button", where it undergoes a community peer-review process and subsequent publication upon acceptance (e.g.,
The work of Pensoft on this project was party supported by the EC-FP7 EU BON project (grant agreement №308454). Barcode analysis was facilitated by funding from the government of Canada through Genome Canada and the Ontario Genomics Institute in support of the International Barcode of Life project. The development of the Barcode of Life Data Systems (BOLD) was enabled by funding from the Ontario Ministry of Research and Innovation. JFT is grateful for the technical support of Teodor Georgiev and Pavel Stoev (Pensoft).