Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Sónia Ferreira (hiporame@gmail.com)
Academic editor: Richard Mally
Received: 11 Dec 2023 | Accepted: 19 Feb 2024 | Published: 16 May 2024
© 2024 Sónia Ferreira, Martin Corley, João Nunes, Jorge Rosete, Sasha Vasconcelos, Vanessa Mata, Joana Veríssimo, Teresa Silva, Pedro Sousa, Rui Andrade, José Manuel Grosso-Silva, Catarina J. Pinho, Cátia Chaves, Filipa MS Martins, Joana Pinto, Pamela Puppo, Antonio Muñoz-Mérida, John Archer, Joana Pauperio, Pedro Beja
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ferreira S, Corley MFV, Nunes J, Rosete J, Vasconcelos S, Mata VA, Veríssimo J, Silva T, Sousa P, Andrade R, Grosso-Silva JM, Pinho CJ, Chaves C, Martins FM, Pinto J, Puppo P, Muñoz-Mérida A, Archer J, Pauperio J, Beja P (2024) The InBIO Barcoding Initiative Database: DNA barcodes of Portuguese moths. Biodiversity Data Journal 12: e117169. https://doi.org/10.3897/BDJ.12.e117169
|
The InBIO Barcoding Initiative (IBI) Dataset - DS-IBILP08 contains records of 2350 specimens of moths (Lepidoptera species that do not belong to the superfamily Papilionoidea). All specimens have been morphologically identified to species or subspecies level and represent 1158 species in total. The species of this dataset correspond to about 42% of mainland Portuguese Lepidoptera species. All specimens were collected in mainland Portugal between 2001 and 2022. All DNA extracts and over 96% of the specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources.
The authors enabled "The InBIO Barcoding Initiative Database: DNA barcodes of Portuguese moths" in order to release the majority of data of DNA barcodes of Portuguese moths within the InBIO Barcoding Initiative. This dataset increases the knowledge on the DNA barcodes of 1158 species from Portugal belonging to 51 families. There is an increase in DNA barcodes of 205% in Portuguese specimens publicly available. The dataset includes 61 new Barcode Index Numbers. All specimens have their DNA barcodes publicly accessible through BOLD online database and the distribution data can be accessed through the Global Biodiversity Information Facility (GBIF).
Lepidoptera, occurrence records, species distributions, continental Portugal, DNA barcode, cytochrome c oxidase subunit I (COI)
The Portuguese fauna of Lepidoptera is relatively rich with 2775 species recorded so far (
In parallel with the efforts to produce and make available reliable data on the taxonomy and chorology of Portuguese moths, the IBI was created - the InBIO Barcoding Initiative, a DNA barcoding initiative by the Research Network in Biodiversity and Evolutionary Biology - InBIO as a result of the paucity of genetic data on Portuguese biodiversity. The InBIO Barcoding Initiative (IBI) makes use of High-Throughput Sequencing technologies to construct a reference collection of morphologically identified Portuguese specimens and respective DNA barcodes (e.g.
This dataset aims to provide a contribution to the knowledge on DNA barcodes of Portuguese moths. Such a library should facilitate DNA-based identification of species for both traditional molecular studies and DNA metabarcoding studies and constitute a valuable resource for taxonomic and ecological research on Lepidoptera, with the focus on the Portuguese fauna.
Figs
Examples of the diversity of species that are part of the dataset of distribution data and DNA barcodes of Portuguese moths. All photos by João Nunes.
Examples of the diversity of species that are part of the dataset of distribution data and DNA barcodes of Portuguese moths (cont.). All photos by João Nunes.
BOLD:AAA7740 (Yponomeuta cagnagella and Yponomeuta evonymella); BOLD:AAA9515 (Chloroclysta miata and Chloroclysta siterata); BOLD:AAD0839 (Macrothylacia digramma and Macrothylacia rubi); BOLD:AAB4833 (Oligia strigilis and Oligia versicolor); BOLD:AAD6780 (Cryphia algae and Cryphia pallida); BOLD:AAF0005 (Lozotaeniodes cupressana and Lozotaeniodes formosana); BOLD:ABV4113 (Cleonymia diffluens and Cleonymia yvanii); BOLD:ACE8354 (Euxoa oranaria and Euxoa tritici); BOLD:ACY5987 (Pleurota andalusica and Pleurota ericella); BOLD:AEC9855 (Pleurota honorella and Pleurota planella).
From the 2775 species belonging to 77 families recorded from continental Portugal (
The name "The InBIO Barcoding Initiative Database: DNA barcodes of Portuguese moths" refers to the first data release of DNA barcodes and distribution data of Portuguese moths within the InBIO Barcoding Initiative.
Pedro Beja (project coordinator), Sónia Ferreira (taxonomist and IBI manager), Martin F.V. Corley (taxonomist, lepidopterist), Cátia Chaves (project technician), Filipa M.S. Martins (molecular biologist), Vanessa A. Mata (molecular biologist), Antonio Muñoz-Mérida (bioinformatician), John Archer (bioinformatician), Joana Paupério (molecular biologist), Catarina J. Pinho (project technician), Joana C. Pinto (project technician), Pamela Puppo (molecular biologist), Teresa L Silva (molecular biologist), Pedro Sousa (project technician), Sasha Vasconcelos (contributor), Joana Veríssimo (molecular biologist), all affiliated to CIBIO-InBIO, University of Porto, José Manuel Grosso-Silva (entomologist), affiliated to the MHNC-UP, University of Porto and Rui Andrade (entomologist), João Nunes (lepidopterologist), Jorge Rosete (lepidopterist), independent researchers.
Continental Portugal (Fig.
Lepidoptera specimens were collected in the field, morphologically identified and DNA barcoded.
The present work was funded by National Funds through FCT-Fundação para a Ciência e a Tecnologia in the scope of the project LA/P/0048/2020. InBIO Barcoding Initiative was funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 668981 and by the project PORBIOTA – Portuguese E-Infrastructure for Information and Research on Biodiversity (POCI-01-0145- FEDER-022127), supported by Operational Thematic Program for Competitiveness and Internationalization (POCI), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (FEDER). The work was partially Funded by Horizon Europe under the Biodiversity, Circular Economy and Environment call (REA.B.3); co-funded by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number 22.00173; and by the UK Research and Innovation under the Department for Business, Energy and Industrial Strategy’s Horizon Europe Guarantee Scheme. The fieldwork benefitted from EDP Biodiversity Chair, the project “Promoção dos serviços deecossistemas no Parque Natural Regional do Vale do Tua: Controlo de Pragas Agrícolas eFlorestais por Morcegos” funded by the Agência de Desenvolvimento Regional do Vale doTua and includes research conducted at the Long Term Research Site of Baixo Sabor (LTER_EU_PT_002). SF and VM were funded by the FCT through the programme ‘Stimulus of Scientific Employment, Individual Support—3rd Edition’ (https://doi.org/10.54499/2020.03526.CEECIND/CP1601/CT0010; https://doi.org/10.54499/2020.02547.CEECIND/CP1601/CT0006). CJP, JV and FMSM by PhD grants (SFRH/BD/145851/2019; SFRH/BD/133159/2017; SFRH/BD/104703/2014) funded by FCT.
Continental Portugal (Fig.
Specimens were collected during field expeditions throughout continental Portugal, from 2001 to 2020. They were captured at night using light traps, the latter with UV LEDs, mixed light or mercury vapour lamps or during the day by direct search. All specimens were observed in the field and, in most cases, they could be readily identified to species level by an experienced taxonomist (Martin Corley). Such specimens were preserved in 96% ethanol and stored at the InBIO Barcoding Initiative reference collection (Vairão, Portugal), where they can be re-examined and genitalia dissected, if needed. Specimens that could not be identified in the field (n = 84) were pinned and dried for subsequent examination in the laboratory. They were then stored at the Research Collection of Martin Corley or the Private Collections of Jorge Rosete or J.M. Grosso-Silva.
DNA extraction was performed using either the 96-Well Plate Animal Genomic DNA Mini-Preps Kit (Bio Basic, Ontario, Canada) or the QIAamp DNA Micro Kit (Qiagen, Germany) which is designed to extract higher concentrations of genetic material from samples with small amounts of DNA. Amplification was performed using two different primer pairs that amplify partially overlapping fragments (LC + BH) of the 658 bp barcoding region of the COI mitochondrial gene. We used the primers FwhF1 (
Successful amplification was validated through 2% agarose gel electrophoresis stained with GelRed (Biotium, USA) and samples selected for sequencing proceeded for a second-round PCR where Illumina P5 and P7 adapters with custom 7 bp long barcodes were attached to each first PCR product. The second PCR was performed in a volume of 10 μl, including 5 μl of KAPA HiFi PCR Kit (KAPA Biosystems, Cape Town, South Africa), 0.5 μl of each 10 mM indexing primer and 2 μl of diluted first PCR product (usually 1:4). PCR cycling conditions were as follows: initial denaturation at 95ºC for 3 min, with 8-10 cycles (adjusted to sample quality) of denaturation at 95ºC for 30 sec, annealing at 50ºC for 60 sec and extension at 72ºC for 45 sec and a final elongation step at 60ºC for 10 min. The amplicons were purified using AMPure XP beads (Beckman Coulter, U.S.A.) and quantified using NanoDrop 1000 (Thermo Scientific, U.S.A.). Clean PCR products were then pooled equimolarly per fragment. Each pool was quantified with KAPA Library Quantification Kit Illumina Platforms (KAPA Biosystems, Cape Town, South Africa) and the 2200 Tapestation System (Agilent Technologies, California, USA) was used for fragment length analysis prior to sequencing (Paupério et al. 2018). DNA sequencing was done at CIBIO facilities on an Illumina MiSeq benchtop system, using V2 MiSeq sequencing kits (2 x 250 bp) (Illumina, California, U.S.A.).
Illumina sequencing reads were processed using OBITools (
All DNA barcodes sequences were compared against the BOLD database and the 99 top hits were inspected in order to detect possible issues due to contaminations or misidentifications.
1. Specimens were collected in 234 different localities. Fieldwork was carried out between 2001 and 2022.
2. Selected specimens were pinned and dried and are preserved in three private collections. Otherwise, specimens collected as tissue samples were stored in 96% ethanol in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources (Vairão, Portugal).
3. All specimens were morphologically identified and DNA barcoded. To sequence the 658 bp COI DNA barcode fragment, one leg was removed from each individual, DNA was extracted and then amplified. All DNA extracts were deposited in the IBI collection.
4. All sequences in the dataset were submitted to BOLD and GenBank databases and, to each sequenced specimen, the morphological identification was contrasted with the results of the BLAST of the newly-generated DNA barcodes in the BOLD Identification Engine.
Continental Portugal
36.960 and 42.124 Latitude; -9.467 and -6.229 Longitude.
This dataset is composed of data relating to 2364 Lepidoptera specimens. All specimens were determined to species level. Overall, 1170 species are represented in the dataset. These species belong to 51 families and 598 genera (Suppl. material
The sampled material was collected in the period from 2001 to 2020.
The InBIO Barcoding Initiative Database: contribution to the knowledge on DNA barcodes of Portuguese moths Lepidoptera dataset can be downloaded from the Public Data Portal of BOLD (http://www.boldsystems.org/index.php/Public_SearchTerms?query=DS-IBILP08) in different formats (data as dwc, xml or tsv and sequences as fasta files). Alternatively, BOLD users can log-in and access the dataset via the Workbench platform of BOLD. All records are also searchable within BOLD, using the search function of the database.
The version of the dataset, at the time of writing the manuscript, is included as Suppl. materials
Column label | Column description |
---|---|
processid | Unique identifier for the sample. |
sampleid | Identifier for the sample being sequenced, i.e. IBI catalogue number at CibioInBIO,Porto University. Often identical to the "Field ID" or "Museum. |
recordID | Identifier for specimen assigned in the field. |
catalognum | Catalogue number. |
fieldnum | Field number. |
institution_storing | The full name of the institution that has physical possession of the voucher specimen. |
bin_uri | Barcode Index Number system identifier. |
phylum_taxID | Phylum taxonomic numeric code. |
phylum_name | Phylum name. |
class_taxID | Class taxonomic numeric code. |
class_name | Class name. |
order_taxID | Order taxonomic numeric code. |
order_name | Order name. |
family_taxID | Family taxonomic numeric code. |
family_name | Family name. |
subfamily_taxID | Subfamily taxonomic numeric code. |
subfamily_name | Subfamily name. |
genus_taxID | Genus taxonomic numeric code. |
genus_name | Genus name. |
species_taxID | Species taxonomic numeric code. |
species_name | Species name. |
identification_provided_by | Full name of primary individual who assigned the specimen to a taxonomic group. |
identification_method | The method used to identify the specimen. |
voucher_status | Status of the specimen in an accessioning process (BOLD controlled vocabulary). |
tissue_type | A brief description of the type of tissue or material analysed. |
collectors | The full or abbreviated names of the individuals or team responsible for collecting the sample in the field. |
lifestage | The age class or life stage of the specimen at the time of sampling. |
sex | The sex of the specimen. |
lat | The geographical latitude (in decimal degrees) of the geographic centre of a location. |
lon | The geographical longitude (in decimal degrees) of the geographic centre of a location. |
elev | Elevation of sampling site (in metres above sea level). |
country | The full, unabbreviated name of the country where the organism was collected. |
province_state | The full, unabbreviated name of the province where the organism was collected. |
region | The full, unabbreviated name of the municipality where the organism was collected. |
exactsite | Additional name/text description regarding the exact location of the collection site relative to a geographic relevant landmark. |
The present work was funded by National Funds through FCT-Fundação para a Ciência e a Tecnologia in the scope of the project LA/P/0048/2020. InBIO Barcoding Initiative was funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 668981 and by the project PORBIOTA – Portuguese E-Infrastructure for Information and Research on Biodiversity (POCI-01-0145- FEDER-022127), supported by Operational Thematic Program for Competitiveness and Internationalization (POCI), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (FEDER). The work was partially Funded by Horizon Europe under the Biodiversity, Circular Economy and Environment call (REA.B.3); co-funded by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number 22.00173; and by the UK Research and Innovation under the Department for Business, Energy and Industrial Strategy’s Horizon Europe Guarantee Scheme. The fieldwork benefitted from EDP Biodiversity Chair, the project “Promoção dos serviços deecossistemas no Parque Natural Regional do Vale do Tua: Controlo de Pragas Agrícolas eFlorestais por Morcegos” funded by the Agência de Desenvolvimento Regional do Vale doTua and includes research conducted at the Long Term Research Site of Baixo Sabor (LTER_EU_PT_002). SF and VM were funded by the FCT through the programme ‘Stimulus of Scientific Employment, Individual Support—3rd Edition’ (https://doi.org/10.54499/2020.03526.CEECIND/CP1601/CT0010; https://doi.org/10.54499/2020.02547.CEECIND/CP1601/CT0006). CJP, JV and FMSM by PhD grants (SFRH/BD/145851/2019; SFRH/BD/133159/2017; SFRH/BD/104703/2014) funded by FCT. The authors would like to acknowledge all colleagues that have joined and helped in fruitful fieldwork throughout the years and shared their thoughts in many profitable discussions on Portuguese moths, namely Ana Rita Gonçalves, Ana Valadares, Carla Gomes, Daniel Oliveira, Luis Silva, Marisa Rodrigues, Pedro Lopes and Rebecca Mateus.
List of species that were collected and DNA barcoded within this project including the sample code (IBI code), the Process ID (BOLDcode), the BIN URI (BOLD BIN) and GenBank acession number (GenBank). * Indicate species with new BINs.
The file includes information about all records in BOLD for the IBI - Lepidoptera 08 library. It contains collecting and identification data. The data are as downloaded from BOLD in the tsv format, without further processing.
The file includes information about all records in BOLD for the IBI - Lepidoptera 08 library. It contains collecting and identification data. The data are as downloaded from BOLD in the Darwin Core Standard format, without further processing.
COI sequences in fasta format. Each sequence is identified by the BOLDProcessID, species name, genetic marker name and GenBank accession number, all separated by a vertical bar. The data are as downloaded from BOLD.
Phylogenetic tree (NJ) of all the specimens DNA barcodes within DS-IBILP08: the IBI-Lepidoptera 08 dataset collected in continental Portugal, all of which have been morphologically identified to species level.
Specimens with occurrence in continental Portugal missing DNA barcodes. To each species is discriminated if the species has specimens in BOLD, but still has no DNA barcode or specimens are still needed.