The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from Portugal

Abstract Background The InBIO Barcoding Initiative (IBI) Orthoptera dataset contains records of 420 specimens covering all the eleven Orthoptera families occurring in Portugal. Specimens were collected in continental Portugal from 2005 to 2021 and were morphologically identified to species level by taxonomists. A total of 119 species were identified corresponding to about 77% of all the orthopteran species known from continental Portugal. New information DNA barcodes of 54 taxa were made public for the first time at the Barcode of Life Data System (BOLD). Furthermore, the submitted sequences were found to cluster in 129 BINs (Barcode Index Numbers), 35 of which were new additions to the Barcode of Life Data System (BOLD). All specimens have their DNA barcodes publicly accessible through BOLD online database. Stenobothruslineatus is recorded for the first time for continental Portugal. This dataset greatly increases the knowledge on the DNA barcodes and distribution of Orthoptera from Portugal. All DNA extractions and most specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources.


Introduction
Insects are very challenging to study due to their astonishing biological diversity, the lack of taxonomic expertise for several groups and traditional morphology-based species identification being often logistically or financially unsustainable.DNA metabarcoding is rapidly emerging to overcome these challenges (van Klink et al. 2022), but its application relies on the existence of comprehensive reference collections of DNA barcodes.This method identifies multiple species from a mixed sample, based on DNA barcoding using high-throughput sequencing (HTS) of a specific DNA marker, usually the mitochondrial cytochrome c oxidase I (COI) gene.DNA barcoding is a molecular biology method for species identification that relies on the comparison of a short mitochondrial DNA sequence of interest to a library of sequences with known species identity (Hebert et al. 2003).Hence, for these comparisons, it is important to guarantee that a good sequence library is being used, i.e. all DNA barcodes come from specimens identified by taxonomists (Chimeno et al. 2023).The Barcode of Life Data Systems (BOLD) is an international barcode reference database made available in 2007 (Ratnasingham and Herbert 2007) and large national barcoding initiatives worldwide have contributed to increasing this library (Janzen and Hallwachs 2011, Geiger et al. 2016, deWaard et al. 2019).In Portugal, the InBIO Barcoding Initiative (IBI) was the first project aimed to develop a reference collection of DNA barcoding sequences, having largely focused on Portuguese invertebrate taxa, particularly insects (Ferreira et al. 2018).
Orthoptera are a diverse group of herbivorous insects and playing an important role in ecosystem functioning as both primary consumers in different habitats and significant food sources for higher trophic levels (e.g.birds, mammals) (Catry et al. 2018, Valdez andCryan 2013).Due to the strong connection with plants, orthopterans are good indicators of habitat changes and, therefore, often used in environmental monitoring and assessment (Bieringer et al. 2013, Vasconcelos et al. 2019).However, despite their functional importance, they are still poorly studied in areas with high levels of endemic species, such as the Iberian Peninsula (Hochkirch et al. 2016).Studies on the Portuguese Orthoptera fauna have been published in a very scattered manner over time, generally lacking comprehensive inventories or focusing on certain Orthoptera taxa (Pina et al. 2017).Although some of these more recent studies report new findings for the country (e.g.Lemos et al. (2016), Monteiro et al. (2016)), the prime information, such as which species occur in Portugal and their distributions, remains very incomplete.
In Europe, large DNA barcoding initiatives have been established mainly in the northern and central countries (Gaytán et al. 2020).In the study conducted by Hawlitschek et al. (2017), DNA barcodes of Orthoptera from four barcoding initiatives (Barcoding Fauna Bavarica (Germany), German Barcode of Life, Austrian Barcode of Life and Swiss Barcode of Life) comprising three central European countries (Austria, Germany and Switzerland) were made available.A total of 748 COI sequences were obtained for several central European Orthoptera taxa that also occur in other countries, namely in Portugal.This study also showed that barcoding studies can be successfully applied to Orthoptera revealing an overall congruence of 76.2% of the 127 Orthoptera taxa in the study.However, more work is necessary to create a library of barcodes of the European Orthoptera because the representation of certain groups (e.g.Tetrigidae, Gryllidae and Bradyporinae) (Kasalo et al. 2023) and regional areas, such as the southern European peninsulas, remains insufficient.The Iberian Peninsula is a known hotspot of endemic Orthoptera species, but many of its species remain unrepresented in the DNA barcoding reference collections.
The IBI Orthoptera dataset contains records of 420 specimens of Orthoptera collected in continental Portugal, all morphologically identified to species level, for a total of 119 species.Our results constitute a first step in the construction of a DNA barcode database of a curated reference collection of Iberian Orthoptera species.

General description
Purpose: This dataset aims to provide a contribution to the knowledge on DNA barcodes of Portuguese Orthoptera.Such a library should facilitate DNA-based identification of species for both traditional molecular studies and DNA metabarcoding studies.Furthermore, it constitutes a valuable resource for taxonomic research on Iberian Orthoptera and their distribution.
Additional information: A total of 420 specimens of orthopterans were collected and DNA barcoded (Table 1, Suppl.material 2) corresponding to 119 species, about 77% of all the orthopteran species known from continental Portugal (Aires and Menano 1915, Pina et al. 2017, GBIF 2023, IUCN 2023) 1).Additionally, the species Stenobothrus lineatus is recorded for the first time for Portugal in this dataset.A full-length barcode of 658 bp was obtained for all 420 specimens.This dataset contributes significantly to the representation of both species and genetic diversity of Orthoptera in public libraries.Of the 120 taxa barcoded, DNA barcodes of 54 are made public for the first time (marked with # in Taxa field of Table 1).The submitted sequences were found to cluster in 129 BINs, 35 of which were new to BOLD (unique BINs, marked with " in Taxa field of Table 1).

Family
The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from ...
The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from ...    The results show multiple BINs for a recognised species in several cases.Namely, three BINs were obtained in each of the recognised species given: Antaxius spinibrachius, Nemobius sylvestris, Tessellana tessellata and Thyreonotus bidens.Two BINs were obtained in each of the following taxa: Gryllomorpha longicauda, Gryllotalpa vineae, Lluciapomaresius asturiensis, Neocallicrania lusitanica, Pezotettix giornae, Pycnogaster cucullatus, Platycleis sabulosa and for the subspecies Neocallicrania selligera meridionalis.
Our dataset includes three BINs obtained for the species Antaxius spinibrachius, including the first sequences of the BIN BOLD:AER0568 from specimens collected in the Castelo Branco and Guarda Districts.In a previous study, aimed at the phylogeography of this species in the Iberian Peninsula, two different lineages of this species were identified, one of which occurs along the Cordillera Oretana and the other includes all other populations (Gutiérrez-Rodríguez et al. 2014).The genetic diversity found in DNA barcodes in this species highlights the necessity of a taxonomic revision.Additionally, in the family Tettigoniidae, the specimen of the Neocallicrania genus collected in the Setúbal Peninsula was morphologically identified as Neocallicrania lusitanica using the available literature (Barat 2007) and the unique BIN BOLD:AEO6978 was obtained.However, the revision conducted by Barat (2013) pointed out uncertainties related to specimens collected in this geographic area.Thus, our results emphasise the need for further work towards a better understanding of the taxonomy of this genus in the Iberian Peninsula.
Of the 119 species, 104 have direct correspondence between morphologic identification and BINs, leaving 15 species involved in BIN sharing.In the subfamily Tettigoniinae, two BINs were found to be shared by more than one species, the BIN BOLD:AEO6325 and the BIN BOLD:AEO7101 was shared by Platycleis albopunctata and Platycleis sabulosa and by Pterolepis lusitanica and Pterolepis spoliata, respectively.In the subfamily Gomphocerinae, the generated sequence of Pseudochorthippus parallelus shared BIN BO LD:AAC3399 with sequences of Pseudochorthippus montanus from Germany, Ukraine and Norway.Introgression caused by hybridisation is a well-studied phenomenon in areas where both species occur in sympatry in Germany (Hochkirch andLemke 2011, Rohde et al. 2015).These species are morphologically very similar, but have different ecological requirements and both species occur in Spain, although Pseudochorthippus montanus is restricted to the northeast (Llucià-Pomares 2002).Additionally, in this subfamily, there are other Gomphocerinae groups of species that cannot be identified using the DNA barcode This study shows that DNA barcode sequences, based on the COI mitochondrial gene fragment, can be useful in identifying Portuguese Orthoptera to species level.To our knowledge, this is the first study to focus on DNA barcoding of the Orthoptera order for the Iberian Peninsula.It also highlights several taxonomic challenges related to the rich fauna of this group and suggests intricated phylogeographic processes in the region that lead to the diversification of several taxa and high endemism levels in line with what previous studies have found (Barranco 2004, Solé et al. 2018).Our results constitute a first step in the construction of a DNA barcode database of a curated reference collection of Iberian (Portuguese) Orthoptera species that can be used in studies where it is necessary to identify specimens either by DNA barcoding or DNA metabarcoding.

Project description
Title: The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from Portugal.
Design description: Orthoptera specimens were collected in the field, morphologically identified and DNA barcoded.

Sampling methods
Description: Continental Portugal (Fig. 3).DNA extraction and sequencing followed the general pipeline used in the InBIO Barcoding Initiative (Ferreira et al. 2018).Briefly, genomic DNA was extracted from leg tissue using the EasySpin Genomic DNA Tissue Kit (Citomed) following the manufacturer's protocol.
Quality control: All DNA barcodes sequences were compared against the BOLD database and the top hits were inspected to detect possible issues due to contamination or misidentifications due to errors in codification during samples processing.
Step description: Specimens were collected in 118 different Portuguese localities between 2005 and 2021.Sampling consisted of direct search of specimens in different types of habitats during the day and the night-time.Additionally, some orthopterans were detected by listening to the calling songs.All specimens were morphologically identified using the available literature, DNA barcoded and deposited in the IBI reference collection at CIBIO (Research Center in Biodiversity and Genetic Resources).To sequence the 658 bp COI DNA barcode fragment, one leg was removed from each individual, DNA was extracted and then amplified.All DNA extracts were deposited in the IBI collection.All sequences in the dataset were submitted to BOLD and GenBank databases and, to each sequenced specimen, the morphological identification was compared with the results of the BLAST of the newly-generated DNA barcodes in the BOLD Identification Engine.

Geographic coverage
Description: Continental Portugal.
The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from ...

Taxonomic coverage
Description: This dataset is composed of data relating to 420 Orthoptera specimens.All specimens were determined to species level, with four specimens further identified to subspecies level.Overall, 119 species are represented in the dataset.These species belong to all the eleven Orthoptera families occurring in Portugal.The Tettigoniidae and Acrididae families accounts for 36% and 34% of the total collected specimens, respectively, followed by Gryllidae family with 14%.A similar pattern was observed for the proportion of species, Acrididae, Tettigoniidae and Gryllidae are represented by the highest number of recorded species (38%, 34% and 10%, respectively) (Fig. 4).identification_provided_by Full name of primary individual who assigned the specimen to a taxonomic group.

Funding:
The present work was funded by National Funds through FCT-Fundação para a Ciência e a Tecnologia in the scope of the project LA/P/0048/2020.InBIO Barcoding Initiative was funded by the European Union's Horizon 2020 Research and Innovation Programme under grant agreement No 668981 and the project PORBIOTA-Portuguese E-Infrastructure for Information and Research on Biodiversity (POCI-01-0145-FEDER-022127), supported by Operational Thematic Program for Competitiveness and

Figure 3 .
Figure 3. Map of the localities where Orthoptera samples were collected in continental Portugal.Portuguese districts are also represented.
Figure 4. Distribution of specimens (A) and species (B), in percentage, per Orthoptera family present in the dataset."Other families" represent less than 1% of the total specimens or species (Pyrgomorphidae, Tridactylidae, Gryllotalpidae).a: Distribution of specimens, in percentage, per Orthoptera family present in the dataset.b: Distribution of species, in percentage, per Orthoptera family present in the dataset.

identification_method
The method used to identify the specimen.voucher_statusStatus of the specimen in an accessioning process (BOLD controlled vocabulary).tissue_type A brief description of the type of tissue or material analysed.collectors The full or abbreviated names of the individuals or team responsible for collecting the sample in the field.lifestage The age class or life stage of the specimen at the time of sampling.sex The sex of the specimen.lat The geographical latitude (in decimal degrees) of the geographic centre of a location.The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from ... lon The geographical longitude (in decimal degrees) of the geographic centre of a location.elev Elevation of sampling site (in metres above sea level).country The full, unabbreviated name of the country where the organism was collected.province_state The full, unabbreviated name of the province ("Distrito" in Portugal) where the organism was collected.region The full, unabbreviated name of the municipality ("Concelho" in Portugal) where the organism was collected.exactsite Additional name/text description regarding the exact location of the collection site relative to a geographic relevant landmark.subspecies_taxID Subspecies taxonomic numeric code.subspecies_name Subspecies name.
. Figs 1, 2 illustrate examples of the diversity of species that are part of the dataset of distribution data and DNA barcodes of Portuguese Orthoptera.The dataset includes 39 Iberian endemic species, of which six occur only in continental Portugal: Ephippigerida rosae, Lluciapomaresius anapaulae, Neocallicrania barrosi, Neocallicrania serrata, Pterolepis lusitanica and Pycnogaster cucullatus (Table

Table 1 .
List of taxa that were collected and DNA barcoded within this project.In column Taxa: -indicates Iberian endemic species; -indicates continental Portugal endemic species; -indicates taxa without a public DNA barcode prior to this study; " -indicates unique BINs.

Family Taxa IBI code BOLD code BOLD BIN GenBank
The InBIO Barcoding Initiative Database: DNA barcodes of Orthoptera from ...
Internationalization (POCI), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund(FEDER).Fieldwork benefitted from the project PTDC/ BIA-BIC/2203/2012-FCOMP-01-0124-FEDER-028289 by FEDER Funds through the Operational Programme for Competitiveness Factors-COMPETE and by National Funds, EDP Biodiversity Chair, the project "Promoção dos serviços de ecossistemas no Parque Natural Regional do Vale do Tua: Controlo de Pragas Agrícolas e Florestais por Morcegos" funded by the Agência de Desenvolvimento Regional do Vale doTua and includes research conducted at the Long Term Research Site of Baixo Sabor (LTER_EU_PT_002).The work was partially Funded by Horizon Europe under the Biodiversity, Circular Economy and Environment call (REA.B.3); co-funded by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number 22.00173; and by the UK Research and Innovation under the Department for Business, Energy and Industrial Strategy's Horizon Europe Guarantee Scheme.SF and VM were funded by the FCT through the programme 'Stimulus of Scientific Employment, Individual Support-3rd Edition' 2020.03526.CEECIND; 2020.02547.CEECIND).JV and FMSM by PhD grants (SFRH/BD/133159/2017; SFRH/BD/104703/2014) funded by FCT.

Table 2 .
Specimens were collected during field expeditions throughout continental Portugal, from 2005 to 2021.Nearly all districts of continental Portugal are represented in the dataset, with the exception of Braga and Viseu.Setúbal, Beja and Bragança were the districts with the highest number of species collected (Table2).Specimens were collected by direct search and stored in 96% ethanol.Specimens were kept as tissue samples and stored at the InBIO Barcoding Initiative reference collection (Vairão, Portugal).Number of specimens and species collected per Portuguese district.
Identifier for the sample being sequenced, i.e.IBI catalogue number at Cibio-InBIO, Porto University.Often identical to the "Field ID" or "Museum ID". sampleid