The InBIO Barcoding Initiative Database: contribution to the knowledge on DNA barcodes of Iberian Plecoptera

Abstract Background The use of DNA barcoding allows unprecedented advances in biodiversity assessments and monitoring schemes of freshwater ecosystems; nevertheless, it requires the construction of comprehensive reference collections of DNA sequences that represent the existing biodiversity. Plecoptera are considered particularly good ecological indicators and one of the most endangered groups of insects, but very limited information on their DNA barcodes is available in public databases. Currently, less than 50% of the Iberian species are represented in BOLD. New information The InBIO Barcoding Initiative Database: contribution to the knowledge on DNA barcodes of Iberian Plecoptera dataset contains records of 71 specimens of Plecoptera. All specimens have been morphologically identified to species level and belong to 29 species in total. This dataset contributes to the knowledge on the DNA barcodes and distribution of Plecoptera from the Iberian Peninsula and it is one of the IBI database public releases that makes available genetic and distribution data for a series of taxa. The species represented in this dataset correspond to an addition to public databases of 17 species and 21 BINs. Fifty-eight specimens were collected in Portugal and 18 in Spain during the period of 2004 to 2018. All specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources and their DNA barcodes are publicly available in the Barcode of Life Data System (BOLD) online database. The distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Introduction
In freshwater ecosystems, biodiversity assessments and monitoring schemes often require the identification of aquatic insect species (e.g. Pawlowski et al. 2018), an often challenging step, namely when only first instars are available in the sample or when studies are developed in regions poorly known from a faunistic perspective. In such cases, DNA barcoding provides a powerful tool to overcome these challenges by using a fragment of DNA to assign organisms to a species in a rapid and automated way (Hebert et al. 2003). Furthermore, environmental DNA (eDNA) is an emerging tool with great potential in conservation for monitoring past and present biodiversity, both in terrestrial and aquatic ecosystems (Thomsen and Willerslev 2015), especially when DNA barcode reference collections are used to link the obtained sequences to reliably identified organisms. The use of DNA barcoding requires the construction of comprehensive reference collections of DNA sequences that represent the existing biodiversity (Ferreira et al. 2018, Kress et al. 2005, Baird et al. 2011. In Europe, initiatives like the DNA barcoding projects, overseen by the Bavarian State Collection of Zoology in Munich (SNSB-ZSM-www.barcoding. zsm.de) through the "Barcoding Fauna Bavarica project" (BFB-www.faunabavarica. de-Haszprunar, 2009), launched in 2009 and by the "German Barcode of Life project" (GBOL -www.bolgermany.de), launched in 2012 (Geiger et al. 2016), has led to the public release of DNA barcode sequence data of over 300 species of Ephemeroptera, Plecoptera and Trichoptera (Morinière et al. 2017). As part of the Mediterranean Basin Biodiversity Hotspot, the Iberian Peninsula presents not only high numbers of species, as it also harbours species with limited distribution range, with many absent in central and northern Europe. The InBIO Barcoding Initiative (IBI) was established to overcome the striking scarcity of genetic data associated with the high biodiversity found in Portugal, focusing mainly on invertebrate taxa. Within the project, a special focus was afforded to aquatic insects, given their role as indicators in biodiversity assessments and monitoring schemes (e.g. Weisser andSiemann 2004, Weigand et al. 2019) and their relevance to food webs and ecosystem functioning. Furthermore, many insect species occurring in the Iberian Peninsula are not represented in public barcode databases , Ferreira et al. 2018, Weigand et al. 2019 and those that exist often show high evolutionary distances to the sequences obtained in this region which may indicate cryptic diversity (Corley et al. 2019b, Corley et al. 2019a, Ferreira et al. 2018. DNA barcoding can therefore be used as a first step in new species discovery and, as such, be used as a tool to help address the taxonomic impediment problem (e.g. Kekkonen and Hebert 2014).
Plecoptera is a neopteran exopterygote insect order characterised by a combination of mainly primitive characters, whose phylogenetic relationships with other insect orders are not completely resolved (Zwick 2000). Except in a few cases, they are amphibiotic animals, with eggs and nymphs occurring in freshwaters and adults inhabiting the terrestrial environment. The commonly called stoneflies are worldwide distributed, except in Antarctica and many islands and are usually associated with unpolluted and well-preserved waters, mainly rivers and streams, where they play important roles as part of their biota (Fochetti andTierno de Figueroa 2008, Stewart 2009) contributing to important ecosystem services (DeWalt and Ower 2019). Their high vulnerability to environmental changes have driven stoneflies to be one of the most endangered groups of insects (Fochetti andTierno de Figueroa 2008, Tierno de Figueroa et al. 2010).
A total of 3718 Plecoptera species have been described all over the world and 489 of them have been reported in Europe (DeWalt and Ower 2019). The European stonefly fauna, included in seven of the 16 existing families, is one of the best studied worldwide, but the degree of knowledge differs between countries. Of the Western European countries, Portugal is one of the less studied from a taxonomic and faunistic point of view. Furthermore, less than 50% of the Iberian Plecoptera have their DNA barcode sequenced. Although the first reports of stonefly species in Portugal date from the mid-XIXth century (Pictet 1841), only a few new records were added for this country during the following hundred years by authors such as Pictet A.E., Albarda, Kempny or Navás (in: Sánchez-Ortega et al. 2002). It was not until 1963 when the first exhaustive work on faunistic and chorology of stoneflies from Portugal, particularly for those of Serra da Estrela, was published as part of a wider study on the Iberian Peninsula (Aubert 1963). Afterwards, the main contributions to the knowledge of the taxonomy and/or faunistics of Plecoptera from Portugal were those of Zwick (1972) The InBIO Barcoding Initiative Database: contribution to the knowledge on DNA barcodes of Iberian Plecoptera dataset contains records of 71 specimens of Plecoptera collected in the Iberian Peninsula, all of which were morphologically identified to species level, for a total of 29 species. This is the first IBI dataset on freshwater insects available in the Global Biodiversity Information Facility (GBIF). All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD). Overall, this paper increases the available information on Iberian freshwater insects by sharing and publicly disseminating the distribution records and DNA barcodes of specimens from our reference collection.

General description
Purpose: This dataset aims to provide a first contribution to an authoritative DNA barcode sequences library for Iberian Plecoptera. Such a library should facilitate DNA-based identification of species for both traditional molecular studies and DNA-metabarcoding studies, as well as freshwater biomonitoring programmes and constitute a valuable resource for taxonomic research on Iberian Plecoptera and its distribution.

Additional information:
A total of 71 specimens of Plecoptera were collected and DNA barcoded (Suppl. materials 1, 2, 3). Sequences of cytochrome c oxidase I (COI) DNA barcodes are 658 bp long (Folmer region) with the exception of Leuctra cazorlana, from which a fragment of 325 bp was obtained. From the 29 species barcoded, 18 (62%) from seven families are new to the DNA barcode database BOLD at the moment of the release (marked with quotation mark ('') in the Species field of Table 1). Six additional BINs from these datasets are new to BOLD (marked with asterisk symbol (*) in BOLD BIN field of Table 1). Therefore, this dataset represents a significant contribution to enhance the species and genetic diversity of Iberian Plecoptera fauna represented in public libraries.

Design description:
Plecoptera specimens were collected in the field, morphologically identified and DNA barcoded.

Sampling methods
Study extent: Iberian Peninsula.

Sampling description:
The studied material was collected in 40 different localities from the Iberian Peninsula (Suppl. materials 1, 2). Sampling was conducted between 2004 and 2018 on a wide range of habitats, using mainly hand-held sweep-nets or direct search for specimens. Collected specimens were examined in ethanol using a binocular stereoscopic microscope and they were stored in 96% ethanol for downstream molecular analysis. Morphological identification was performed using keys and descriptions from literature (mainly Tierno de Figueroa et al. 2003 and DNA extraction and sequencing followed the general pipeline used in the InBIO Barcoding Initiative (Ferreira et al. 2018). Briefly, genomic DNA was extracted from leg tissue using EasySpin Genomic DNA Tissue Kit (Citomed) following the manufacturer's protocol. The cytochrome c oxidase I (COI) barcoding fragment (Folmer region) was amplified as two overlapping fragments (LC and BH), using two sets of primers: LCO1490 (Folmer et al. 1994) + Ill_C_R (Shokralla et al. 2015) and Ill_B_F (Shokralla et al. 2015) + HCO2198 (Folmer et al. 1994), respectively. The partial COI mitochondrial gene (Folmer region) was then sequenced in a MiSeq benchtop system. OBITools (https://git.metabarcoding.org/ obitools/obitools) was used to process the initial sequences which were then assembled into a single 658 bp fragment using Geneious 9.1.8. (https://www.geneious.com).
Quality control: All DNA barcodes sequences were compared against the BOLD database and the 99 top hits were inspected in order to detect possible issues due to contamination or misidentifications. Prior submission to GBIF, data were checked for errors and inconsistencies with OpenRefine 3.2 (http://openrefine.org).
Step description: Specimens were collected in 40 different localities of the Iberian Peninsula. Sampling was conducted from 2004 to 2018 and consisted of direct search of specimens on rocks and vegetation of streams and river margins and in the use of entomological nets to intercept specimens in flight. Specimens collected were stored in 96% ethanol. A tissue sample (leg) was removed, from which DNA was extracted and the COI DNA barcode fragment was sequenced. Data generated were submitted to BOLD, GenBank and GBIF.

Taxonomic coverage
Description: This dataset is composed of data relating to 71 Plecoptera specimens. All specimens were determined to species level. Overall, 29 species are represented in the dataset. These species belong to 16 genera and seven families.