The InBIO barcoding initiative database: DNA barcodes of Iberian Trichoptera, documenting biodiversity for freshwater biomonitoring in a Mediterranean hotspot

Abstract Background The Trichoptera are an important component of freshwater ecosystems. In the Iberian Peninsula, 380 taxa of caddisﬂies are known, with nearly 1/3 of the total species being endemic in the region. A reference collection of morphologically identified Trichoptera specimens, representing 142 Iberian taxa, was constructed. The InBIO Barcoding Initiative (IBI) Trichoptera 01 dataset contains records of 438 sequenced specimens. The species of this dataset correspond to about 37% of Iberian Trichoptera species diversity. Specimens were collected between 1975 and 2018 and are deposited in the IBI collection at the CIBIO (Research Center in Biodiversity and Genetic Resources, Portugal) or in the collection Marcos A. González at the University of Santiago de Compostela (Spain). New information Twenty-nine species, from nine different families, were new additions to the Barcode of Life Data System (BOLD). A success identification rate of over 80% was achieved when comparing morphological identifications and DNA barcodes for the species analysed. This encouraging step advances incorporation of informed Environmental DNA tools in biomonitoring schemes, given the shortcomings of morphological identifications of larvae and adult Caddisﬂies in such studies. DNA barcoding was not successful in identifying species in six Trichoptera genera: Hydropsyche (Hydropsychidae), Athripsodes (Leptoceridae), Wormaldia (Philopotamidae), Polycentropus (Polycentropodidae) Rhyacophila (Rhyacophilidae) and Sericostoma (Sericostomatidae). The high levels of intraspecific genetic variability found, combined with a lack of a barcode gap and a challenging morphological identification, rendered these species as needing additional studies to resolve their taxonomy.


Introduction
DNA barcoding is a molecular biology method for species identification that was proposed almost twenty years ago (Hebert et al. 2003).DNA barcoding relies on the comparison of a short mitochondrial DNA sequence of interest, usually a 658 bp fragment of the cytochrome c oxidase subunit I (COI) of the mitochondrial genome, known as the "Folmer region" (Folmer et al. 1994), although other regions and genes can also be used, including ones with different systematic scopes (e.g.Woese and Fox (1977)).For DNA barcoding to work, the sequence of interest must be compared to a library containing sequences with known species identification (Hebert et al. 2003, Hebert et al. 2004).As such, the construction of comprehensive reference libraries is essential and these require the morphological identification of vouchers by an expert taxonomist (Baird and Sweeney 2011, Ferreira et al. 2018, Kress et al. 2015).DNA barcoding applications have since expanded beyond single organism and species identification studies.Development of DNA metabarcoding (Taberlet et al. 2012) was made possible with the advances in PCR technologies and high-throughput sequencing (HTS) (Liu et al. 2019).Multiple DNA barcodes are sequenced in a single sample, allowing the study of complex samples as bulk samples and environmental DNA.DNA metabarcoding has broadened the use of the two.DNA barcodes are now a ubiquitous tool in ecological and biological conservation studies, as well as, for example, in forensic applications (DeSalle and Goldstein 2019, Fišer Pečnikar and Buzan 2013, Kress et al. 2015).
Aquatic ecosystems are suffering high losses in biodiversity due to degradation and habitat destruction (Blancher et al. 2022).These ecosystems can be logistically challenging and time-consuming to monitor, as the current methodology is based on inventories and taxonomical diversity, based on morphology (Blancher et al. 2022).DNA metabarcoding has great potential for conservation and monitoring of aquatic ecosystems studies as it allows efficient, non-invasive and standardised sampling, without a priori knowledge of the existing biodiversity in an area (Thomsen andWillerslev 2015, Valentini et al. 2016).The choice of DNA markers and the biomass of the communities to monitor are important factors that can influence successful use of DNA metabarcoding (Thomsen and Willerslev 2015, Valentini et al. 2016, Casey et al. 2021).
The Trichoptera, or caddisflies, is an order of holometabolous insects that rank seventh overall amongst insect orders regarding species number, with 16,267 described species (Morse 2022) and is the most speciose of the primarily aquatic insect orders.Species of this order can be found in all continents, except Antarctica (Morse et al. 2019).While adults are mostly terrestrial and capable of flight, most species' eggs, larvae and pupae are found in freshwater habitats (Morse et al. 2019).Adult caddisflies are moth-like insects having their bodies covered with setae or hairs (Holzenthal et al. 2007, Morse et al. 2019, Thomas et al. 2020).Their larvae are known for their ability to use silk to construct shelters and retreats, but some species can also be free-living (Casey et al. 2021, González and Cobo 2006, Holzenthal et al. 2015, Martín 2017, Martínez 2014, Morse et al. 2019, Thomas et al. 2020, Zhou et al. 2016).Caddisfly larvae provide several important ecological services, including their crucial role in the trophic dynamics and energy flow in the lakes, rivers and streams freshwater food webs (Holzenthal et al. 2015, Morse et al. 2019, Zhou et al. 2016).They show differential sensitivity to pollution and their diversity and abundance are widely used in biological freshwater monitoring (Resh and Rosenberg 1984).However, these programmes rely on larval morphological identification, which is much more challenging than adult determination and still impossible in the many species, whose larvae have not yet been described (Morse et al. 2019).
Environmental DNA has the potential to be used as a complement or as an alternative to the hurdles of current morphology-based identification in the scope of freshwater monitoring schemes (Lefrançois et al. 2020).However, successful application of eDNA in Europe will necessitate comprehensive reference collections of DNA sequences, representing existing European aquatic biodiversity (Baird and Sweeney 2011, Ferreira et al. 2018, Kress et al. 2015).Several studies have used barcodes to advance the knowledge on Trichoptera, either expanding the knowledge on their phylogeny or improving the DNA barcodes of Trichoptera species (e.g.Morinière et al. (2017), Zhou et al. (2016)).
In this work, we present a contribution to the DNA barcode library of the Iberian Peninsula species of Trichoptera representing 37% (n = 142) of the Caddisflies known in the region and 38% (n = 57) of the known endemic Iberian taxa.This work was conducted within the framework of the InBIO Barcoding Initiative.

General description
Purpose: This dataset aims to provide a first contribution to an authoritative DNA barcode sequences library for Iberian Trichoptera, documenting biodiversity for freshwater biomonitoring in a Mediterranean hotspot.Such a library aims to enable DNA-based identification of species for both traditional molecular studies and DNA-metabarcoding studies.Furthermore, it constitutes a relevant resource for taxonomic research on Iberian Trichoptera and its distribution.

Additional information:
A total of 438 Trichoptera specimens were sequenced (Suppl.material 1).A full-length barcode of 658 bp was obtained for 400 specimens (91.3%) (Table 1, Suppl.material 2).These specimens represent 142 (37%) of the approximately 380 Caddisflies species known to occur in the Iberian Peninsula (González and Martínez 2011, Martínez 2014, Martín 2017).Furthermore, 57 taxa are Iberian endemics, representing 38% of the total endemic Iberian taxa (González and Martínez 2011, Martínez 2014, Martín 2017).The dataset includes 22 of the 23 families known to occur in the Iberian Peninsula (Table 1).These data contribute with 29 new taxa, 26 new species and three new subspecies of Trichoptera to the BOLD database (Table 1).For five additional species, the dataset contributes for the first time a full-length barcode.The InBIO barcoding initiative database: DNA barcodes of Iberian Trichoptera, ... The InBIO barcoding initiative database: DNA barcodes of Iberian Trichoptera, ...

IBI code BOLD code BOLD BIN
The BOLD BIN system uses algorithms to cluster sequences into operational taxonomic units (OTUs) that closely correspond to species (Ratnasingham and Hebert 2013).A total of 146 BINs were retrieved by BOLD (Ratnasingham and Hebert 2007).Seven specimens have not been BIN attributed as their sequence is only 418 bp and no other specimens have been sequenced (Suppl.material 1).Two specimens, identified to the genus level only as Helicopsyche sp., clustered together in a separate BIN, "BOLD:AEC8747".Of the 146 BINs, 45 BINs are unique to our dataset (Table 1, Suppl.material 1).Using the criteria followed by Ratnasingham and Hebert (2013), there were 83.6% of matches, 3.7% of merges, 6.7% of splits and 6.0% of mixtures when comparing BINs to the morphological identifications (Fig. 1).The BINs generated by BOLD clustered together sequences that closely agree with the morphological identifications of the specimens, with only a few exceptions in nine of the 22 Trichoptera families analysed.
The independent RESL run (Ratnasingham and Hebert 2013, Ratnasingham and Hebert 2007) retrieved 153 OTUs, plus one OTU for the Helicopsyche sp.specimens (Suppl.material 4).The differences found between the RESL OTUs and the morphological identifications were similar to those found between the latter and BOLD's BINs, with 81.7% of matches, 4.2% of merges, 7.7% of splits and 6.3% of mixtures when comparing OTUs to the morphological identifications (Fig. 1).
Nevertheless, some differences existed between the RESL OTU clustering and the BINs created by BOLD (Suppl.materials 1, 4).In the family Hydropsychidae, sequences identified as Hydropsyche instabilis (Curtis, 1834) clustered into a single OTU, but were  This study showed that DNA barcode sequences, based on the COI mitochondrial gene fragment, can be useful in identifying Iberian Trichoptera samples to species level.We achieved more than 80% success in matching the sequences generated to the morphological identification of the specimens.This is similar to the success rate achieved in 2017 (Morinière et al. 2017) for German Caddisflies (79.8%).A DNA barcode library is an essential tool for incorporating Environmental DNA techniques in monitoring schemes of aquatic ecosystems that use Iberian Caddisflies (Lefrançois et al. 2020).Our results constitute a first step in the construction of a DNA barcode database of a curated reference collection of Iberian Trichoptera species, which could be used to overcome the difficulties in identifying many of the Trichoptera larval specimens of traditional biological freshwater monitoring studies.
Incongruences were found in nine families.In six of them, Glossosomatidae, Helicopsychidae, Polycentropodidae, Limnephilidae, Rhyacophilidae and Psychomyiidae, the barcode analysis identified no species boundaries, with high levels of intraspecific genetic diversity (Suppl.material 3).It is possible that such levels of genetic diversity point to undescribed, distinct species.This hypothesis requires further morphological studies to search for diagnostic morphological traits that might separate these species.
In the family Hydropsychidae, nine species of the genus Hydropsyche could be identified through their barcodes and their genetic distances ranged between 13.4% and 23%.
However, five other species could not be identified through DNA barcodes.These species, H. ambigua, H. infernalis, H. pictetorum, H. siltalai and H. tenuis were spliced between different BINs and OTUs, shared by some, but not all of the same species, further complicating their relationships.For the species with enough sequenced specimens, all were found to have moderate to high levels of intraspecific genetic diversity (Suppl.material 3).These species are difficult to identify morphologically and this study emphasises the need for further work towards a better understanding of the taxonomy of the genus in the Iberian Peninsula (Zamora-Muñoz et al. 2017).
In the family Leptoceridae, sequences identified as Athripsodes alentexanus and A. braueri clustered in a single BIN.All four sequences were identical.As such, DNA barcodes, based on COI, might not differentiate between these two species.This can be the result of an introgression event, if they had split very recently or alternatively, if their taxonomic identity needs revision.
In the family Philopotamidae, two Wormaldia beaumonti and one W. lusitanica sequences were in the dataset.Two BINs are present in BOLD with both species represented in each (from previous data, but also with the new data).This genus is very difficult to identify morphologically and is likely that the morphological characters used are not able to separate both taxa.
In the family Sericostomatidae, there were problems separating two species of the genus Sericostoma, S. pyrenaicum and S. vittatum.These species clustered together into two different BINs, but sequences of S. pyrenaicum and S. vittatum also clustered in additional BINs.Intraspecific genetic diversity is relatively high in both species (2.49% and 2.89%, respectively).González et al. (1992) and Martínez (2014) already pointed out that, under these two names, a complex of species is actually hidden, some of them quite variable morphologically.A detailed morphological-molecular study may help to solve one of the most difficult taxonomic problems of our fauna.These findings suggest that both species need a taxonomic revision.
Our results did not corroborate the findings of Valladolid et al. (2018) and suggest further work is necessary regarding the identity of Rhyacophila adjuncta and R. sociata.These authors restored the species R. sociata, previously considered a junior synonym of R. denticulata McLachlan, 1879.However, both BOLD clustering algorithms merged our samples, identified as R. adjuncta (2 specimens) and R. sociata (2 specimens), into a single BIN "BOLD:AAD5575".Furthermore, this BIN includes all publicly available sequences in BOLD identified as R. adjuncta and R. sociata, including all sequences generated by Valladolid et al. (2018).In their paper, the authors did not investigate a possible relationship between these two species, nor was that relationship assessed in a subsequent study on the European species of the R. fasciata group (Valladolid et al. 2021).Finally, the BIN mentioned above also includes other sequences identified as R. tristis Pictet, 1834.and R. fasciata Hagen, 1859, although these are probably misidentifications.
We also identified several cases that require further study by taxonomists.Other possibilities for the incongruence found amongst the results include the existence of hybridisation, introgression or incomplete lineage sorting in these species, especially if they result from recent speciation events (e.g.Behrens-Chapuis et al. ( 2021), Morinière et al. (2017), Zhou et al. (2016)).These hypotheses require the combination of nuclear and mitochondrial markers to be resolved, preferably in an integrative taxonomic approach.Study area description: Iberian Peninsula (Fig. 2).

Project description
Design description: Specimens were collected during field expeditions in the Iberian Peninsula, from 1975 to 2018 (n = 434 Fig. 2, Suppl.material 1), with more than 60% of Sampling localities of the Trichoptera specimens analysed in this study.Nine localities could not be mapped because geographic coordinates were not available.
specimens collected in the period between 2015 and 2017 (274 out of 434).Two additional specimens were collected in the French Pyrenees.Specimens kept at the InBIO Barcoding Initiative (IBI) reference collection (Vairão, Portugal), 230 in total, were stored in 96% ethanol.Specimens kept at the Colección Marcos A. González (Universidad de Santiago de Compostela, Spain), 206 in total, were stored in either 70% or 96% ethanol.
For each species, we selected, on average, three specimens for DNA sequencing, based on their location of capture, attempting to maximise the geographical coverage of the study.
DNA was extracted using two different kits: EasySpin Genomic DNA Microplate Tissue Kit (Citomed, Odivelas, Portugal) or QIAmp DNA Micro Kit (Qiagen, Hilden, Germany).QIAmp DNA Micro Kit is designed to extract higher concentrations of genetic material from samples with small amounts of DNA.
DNA amplification was performed using three different primer pairs, that amplify three overlapping fragments of the same 658 bp region of the COI mitochondrial gene.In the beginning of the project (2015), we used two primer pairs, LCO1490 (Folmer et al. 1994) + Ill_C_R (Shokralla et al. 2015) and Ill_B_F (Shokralla et al. 2015) + HCO2198 (Folmer et al. 1994) (henceforth referred to as LC and BH, respectively) to amplify two overlapping fragments of 325 bp and 418 bp, respectively.After the publication of the third primer pair, BF2 + BR2 (422 bp fragment), by (Elbrecht and Leese 2017), this started to be used instead of the second primer pair (Ill_B_F + HCO2198) due to higher amplification efficiency.PCRs were performed in 10 µl reactions, containing 5 µl of Multiplex PCR Master Mix (Qiagen, Germany), 0.3 (BF2-BR2) -0.4 mM of each primer, and 1-2 µl of DNA, with the remaining volume in water.The thermocycling for PCR reactions was performed in T100 Thermal Cycler (Bio-Rad, California, USA) and carried out with an initial denaturation at 95ºC for 15 min, followed by 5 cycles at 95ºC for 30 sec, 47ºC for 45 sec, 72ºC for 45 sec (only for LC and BH); then 40 cycles at 95ºC for 30 sec, 51ºC for 45 sec (48ºC for 60 sec for BF2 + BR2), 72ºC for 45 sec; and a final elongation step at 60ºC for 10 min.
All PCR products were analysed by agarose gel electrophoresis and samples selected for sequencing were then organised for assignment of sequencing 'indexes'.One of two types of index was used for each run.For Illumina indexes, samples were pooled into one plate, as described in Shokralla et al. (2015).When using custom indexes, designed, based on Meyer and Kircher (2010), no pooling was required.The latter allow for a maximum of 1920 unique index combinations.A second PCR was then performed where the 'indexes' and Illumina sequencing adapters were attached to the PCR product.The index PCR was performed in a volume of 10 µl, including 5 µl of Phusion® High-Fidelity PCR Kit (New England Biolabs, U.S.A.) or KAPA HiFi PCR Kit (KAPA Biosystems, U.S.A.), 0.5 µl of each 'index' and 2 µl of diluted PCR product (usually 1:4).This PCR reaction runs for 10 cycles at an annealing temperature of 55ºC.The amplicons were purified using AMPure XP beads (Beckman Coulter Genomics, Massachusetts, United States) before quantification using NanoDrop 1000 (Thermo Fisher Scientific, Massachusetts, USA).Concentrations between samples were then normalised and samples were pooled, based on used primer sets.
Quantification of final pools was assessed through qPCR using the KAPA Library Quantification Kit Illumina® Platforms (Kapa Biosystems) and the 2200 Tapestation System (Agilent Technologies, California, USA) was used for fragment length analysis as described by Paupério et al. (2018).
Sequencing was conducted in an Illumina MiSeq benchtop system, using a V2 MiSeq sequencing kit (2x 250 bp) to perform sequencing at CIBIO facilities.
Sequences were filtered and processed with OBITools (Boyer et al. 2015) and the fragments were assembled into their consensus 658 bp-long sequences using Geneious 6.1.8(https://www.geneious.com).The obtained DNA sequences were then compared against the Barcode of Life Data Systems (BOLD) database (Ratnasingham and Hebert 2007) using the built-in identification engine, based on the BLAST algorithm.Sequences were submitted to the BOLD database and the Barcode Index Numbers (BIN) for every sequence were retrieved and analysed (Suppl.materials 1, 2).As not all our sequences matched the criteria used in BOLD (sequence length) to be clustered in a BIN, we ran the Refined Single Linkage algorithm (RESL, Ratnasingham and Hebert ( 2013)) on our dataset in the BOLD system (Ratnasingham and Hebert 2007) in an independent run (Suppl.material 4).This process clusters sequences independent of their BIN registry, generating OTUs that can be analysed independently.
All DNA barcode sequences were aligned in Geneious 6.1.8with MUSCLE (Edgar 2004) plug-in.Nucleotide composition of all sequences, as well as intra and interspecific pdistances,were calculated in MEGA11 (

Sampling methods
Description: Iberian Peninsula.
Sampling description: Specimens were captured during direct searches of the environment, using mainly hand-held sweep-nets or lured by light trapping, the latter with UV (black-light) LEDs.Morphological identification was done, based on Malicky (2004) using a stereoscopic microscope for the study of genitalia.In some cases, genitalia were cleared in 10% potassium hydroxide (KOH) at room temperature for 4-8 hours, rinsed in water and placed in a drop of glycerine or resin (DMHF) on a clean slide for further study.
From each specimen, one tissue sample (a leg) was removed and stored in 96% ethanol for DNA extraction at the IBI collection.
Quality control: All DNA barcode sequences were compared against the BOLD database and the 99 top results were inspected in order to detect possible problems due to contaminations or misidentifications.Prior to GBIF submission, data were checked for errors and inconsistencies with OpenRefine 3.3 (http://openrefine.org).
Step description: Specimens were collected in 66 different localities in Portugal and 74 localities in Spain.Collections were carried out between 1975 and 2018.Specimens were collected during fieldwork by direct search of specimens, by sweeping the vegetation with a hand-net and by using light traps and were preserved in 96% alcohol.Captured specimens were deposited in the IBI reference collection at CIBIO (Research Center in Biodiversity and Genetic Resources) or in the collection Marcos A. González at the University of Santiago de Compostela (Spain).Specimens were morphologically identified with the assistance of stereoscopic microscopes (Leica MZ12, 8x to 100x; Olympus SZX16, 7x to 115x).DNA barcodes were sequenced from all specimens.For this, one leg was removed from each individual, DNA was then extracted and a 658 bp COI DNA barcode fragment was amplified and sequenced.All obtained sequences were submitted to BOLD and GenBank databases and, to each sequenced specimen, the morphological identification, when available, was contrasted with the results of the BLAST of the newly-generated DNA barcodes in the BOLD Identification Engine.Prior to submission to GBIF, data were checked for errors and inconsistencies with OpenRefine 3.3 (http://openrefine.org/).

Geographic coverage
Description: Specimens were collected in the Iberian Peninsula, 229 from 66 localities in Portugal and 207 from 74 localities in Spain (Fig. 2, Suppl.material 5 for further details).Two additional specimens were collected in two French localities.The Rhyacophila laevis Pictet, 1834 specimen represented in the dataset was collected in the French Pyrenees.

Taxonomic coverage
Description: This dataset is composed of data relating to 438 Trichoptera specimens.All specimens were determined to species level, with 14 specimens further identifed to subspecies level.Overall, 141 species are represented in the dataset.These species belong to 22 families.The InBIO barcoding initiative database: DNA barcodes of Iberian Trichoptera, ...

Collection data
Collection name: InBIO Barcoding Initiative Collection identifier: 4ec2b246-f5fa-4b90-9a8d-ddafc2a3f970 Specimen preservation method: "Alcohol" identification_provided_by Full name of primary individual who assigned the specimen to a taxonomic group.

identification_method
The method used to identify the specimen.
voucher_status Status of the specimen in an accessioning process (BOLD controlled vocabulary).

Figure 1 .
Figure 1.Comparison in OTU assignment performance between BOLD's BIN and RESL stand-alone algorithms.The BIN dataset comprised 135 taxa (134 species) and the RESL stand-alone run comprised the entire 142 taxa (141 species) dataset.The four categories: MATCH, MERGE, SPLIT and MIXTURE into which the OTUs were divided, follow the criteria used by Ratnasingham and Hebert (2013).
tissue_typeA brief description of the type of tissue or material analysed.The InBIO barcoding initiative database: DNA barcodes of Iberian Trichoptera, ... collectors The full or abbreviated names of the individuals or team responsible for collecting the sample in the field.lifestage The age class or life stage of the specimen at the time of sampling.sex The sex of the specimen.lat The geographical latitude (in decimal degrees) of the geographic centre of a location.lon The geographical longitude (in decimal degrees) of the geographic centre of a location.elev Elevation of sampling site (in metres above sea level).country The full, unabbreviated name of the country where the organism was collected.province_state The full, unabbreviated name of the province ("Distrito" in Portugal) where the organism was collected.region The full, unabbreviated name of the municipality ("Concelho" in Portugal) where the organism was collected.exactsite Additional name/text description regarding the exact location of the collection site relative to a geographic relevant landmark.

Table 1 .
List of species that were collected and DNA barcoded within this project.The InBIO barcoding initiative database: DNA barcodes of Iberian Trichoptera, ...

Taxa IBI code BOLD code BOLD BIN
split into two BINs.In the family Leptoceridae, sequences of specimens identified as Athripsodes alentexanus and A. braueri clustered in a single BIN.In the family Philopotamidae, sequences identified as Philopotamus perversus McLachlan, 1884 clustered into two OTUs, but were represented by a single BIN.In the family Polycentropodidae, sequences identified as Polycentropus flavomaculatus clustered into a single OTU, but were split into two BINs.In the family Rhyacophilidae, sequences identified as R. dorsalis (Curtis, 1834) and its subspecies, R. d. albarracinaMalicky, 2002clustered into a single OTU, but other sequences identified as R. dorsalis clustered into a different OTU.All R. dorsalis sequences share a single BIN, but the subspecies' sequences have not been BIN attributed as their sequences are only 418 bp.Sequences identified as R. intermedia McLachlan, 1868 clustered into three OTUs, but were represented by a single BIN.Additionally, sequences identified as R. martynovi Mosely, 1930 clustered into two OTUs, but were represented by a single BIN.Furthermore, sequences identified as Rhyacophila munda clustered into two OTUs, but were split into three BINs.In the family Sericostomatidae, there was no separation of the species Sericostoma pyrenaicum and S. vittatum.These species clustered together into two different BINs, but sequences of S. pyrenaicum and S. vittatum also clustered in additional BINs (Suppl.materials 1, 4).This work provided new DNA barcode sequences and distributional data for 436 specimens of Iberian Trichoptera, plus two French specimens.The dataset represents 37% of the Caddisflies known to occur in Iberia and the work added 29 taxa previously not represented in the BOLD database.To our knowledge, this is the first study to focus on DNA barcoding of the Trichoptera order for the Iberian Peninsula.
The InBIO Barcoding Initiative Database: DNA Barcodes of Iberian Trichoptera dataset can be downloaded from the PublicData Portal of BOLD (http:// www.boldsystems.org/index.php/Public_SearchTerms?query=DS-IBITR01) in different formats (data as dwc, xml or tsv and sequences as fasta files).Alternatively, BOLD users can log-in and access the dataset via the Workbench platform of BOLD.All records are also searchable within BOLD, using the research function of the database.The InBIO Barcoding Initiative will continue sequencing Iberian Trichoptera for the BOLD database, with the ultimate goal of comprehensive coverage.The version of the dataset, at the time of writing the manuscript, is included as in the form of one text file for record information as downloaded from BOLD, one text file with the collection and identification data in Darwin Core Standard format (downloaded from GBIF, Martín et al. (2022)) and of a fasta file containing all sequences as downloaded from BOLD.It should be noted that, as the BOLD database is not compliant with the Darwin Core Standard format, the Darwin Core formatted file (dwc) that can be downloaded from BOLD is not strictly Darwin Core formatted.For a proper Darwin Core formatted file, see http://ipt.gbif.pt/ipt/resource?r=ibi_trichoptera_01&; v = 1.1 (Suppl.material 5).All data are available in the BioStudies database (http://www.ebi.ac.uk/biostudies) under accession number S-BSST920.Identifier for the sample being sequenced, i.e.IBI catalogue number at Cibio-InBIO, Porto University.Often identical to the "Field ID" or "Museum ID".