Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: João Queirós (joao.queiros@cibio.up.pt), Miguel Porto (miguel.porto@cibio.up.pt)
Academic editor: Patricia Mergen
Received: 18 Nov 2024 | Accepted: 14 Jan 2025 | Published: 24 Jan 2025
© 2025 João Queirós, Rodrigo Silva, Catarina J. Pinho, Hélia Vale-Gonçalves, Ricardo Pita, Paulo Alves, Pedro Beja, Joana Paupério, Miguel Porto
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Queirós J, Silva R, Pinho CJ, Vale-Gonçalves HM, Pita R, Alves PC, Beja P, Paupério J, Porto M (2025) The InBIO Barcoding Initiative Database: contribution to the knowledge of DNA barcodes of the vascular plants of north-eastern Portugal. Biodiversity Data Journal 13: e142020. https://doi.org/10.3897/BDJ.13.e142020
|
|
Metabarcoding is invaluable for understanding trophic interactions, enabling high-resolution and rapid dietary assessments. However, it requires a robust DNA barcode reference library for accurate taxa identification. This dataset has been generated in the framework of the InBIO Barcoding Initiative (IBI) and Agrivole project. The integration of these two projects was crucial, as Agrivole aimed to investigate the trophic niche of small mammals in Trás-os-Montes Region through DNA metabarcoding, which required a reliable plant DNA barcode library for this same region. Given the large number of species not yet represented in international databases, a survey of local plants was essential to fill this gap. Thus, this study created an accurate DNA reference database for the plants of the Trás-os-Montes Region of Portugal.
The current DNA reference database contains 632 vascular plant samples, all morphologically identified and belonging to 435 species. This represents 14% and 38.7% of the total known plant species for Portugal and the study area, respectively.
Of the 1781 barcode sequences provided in this dataset, 1099 contain new information (61.7%) at different levels: 254 (13.6%, ITS2: 41, trnL-ef: 126, trnL-gh: 87) are completely new to GenBank and/or BOLD databases at the time of publication, 438 (24.6%, ITS2: 59, trnL-ef: 173, trnL-gh: 206) are new records for a given species and 407 (22.9%, ITS2: 187, trnL-ef: 206, trnL-gh: 14) provide additional information (e.g. different bp length, intraspecific genetic variability); the remaining 682 sequences (38.3%) are equal (100% identity) to sequences already publicly available for the identified species. Overall, this dataset represents a significant contribution to the genetic knowledge of vascular plants represented in public libraries. This is one of the public releases of the IBI database, which provides genetic and distributional data for several taxa.
All vouchers are deposited in the Herbarium of the Museum of Natural History and Science of the University of Porto (MHNC-UP) and their DNA barcodes are publicly available in the Barcode of Life Data System (BOLD), NCBI GenBank online databases and International Nucleotide Sequence Database Collaboration (INSDC).
reference database, DNA metabarcoding, ITS2, TrnL-ef, Trnl-gh, Trás-os-Montes Region, HTS technology
Traditional approaches to biodiversity assessment have been revolutionised by DNA-based species identification (
Metabarcoding is a particularly valuable tool for understanding trophic interactions (
The region of Trás-os-Montes is located in the mountainous northeast of Portugal, comprising the Bragança and Vila Real Districts (Fig.
The present vascular plant dataset contributes to improving the molecular identification of the flora present not only in Portugal and the Iberian Peninsula, but also in other regions of the world where some species are also present. This DNA barcode reference library was generated in the framework of the InBIO Barcoding Initiative (IBI) and Agrivole project. The IBI project uses HTS technologies to build reference datasets of DNA barcodes from morphologically identified Portuguese specimens (
This dataset was generated as part of the Agrivole project that aimed to investigate the ecological role of small mammals, particularly voles, in local agroecosystems of the Trás-os-Montes Region in north-eastern Portugal, in collaboration with the InBIO Barcoding Initiative (IBI), which uses HTS technologies to build reference datasets of DNA barcodes from morphologically identified Portuguese specimens. To achieve this goal, the Agrivole project investigated the trophic niche of these small mammals through DNA metabarcoding analysis of their faeces using high throughput sequencing methods. This required a reliable plant DNA barcoding library for the region to accurately and precisely infer the dietary taxa consumed by each species, for which collaboration with IBI was crucial. Therefore, this database aims to provide a reference database of DNA barcodes of the vascular plants of the Trás-os-Montes Region. This library will facilitate the DNA-based identification of plant species in the Agrivole project and other DNA metabarcoding and traditional molecular studies, such as environmental DNA (eDNA) monitoring, which has shown a growing trend in recent years (
A total of 659 vascular plant samples were collected, of which 647 were morphologically identified to the most specific taxon (species, subspecies or variety), comprising 448 species belonging to 69 families (Suppl. materials
Of these, 632 samples were successfully barcoded, covering 435 species (Suppl. material
With ITS2, 594 samples were successfully barcoded, 590 for trnL primer ef (trnL-ef) and 597 for trnL primer gh (trnL-gh), with 541 samples amplified for the three primer sets. This resulted in a total of 1781 barcode sequences for the sum of the three markers used (Suppl. materials
Of the 1781 barcode sequences, 1099 (61.7%) contain new information at different levels: 254 (13.6%, ITS2: 41, trnL-ef: 126, trnL-gh: 87) are completely new to GenBank and/or BOLD databases at the time of publication, 438 (24.6%, ITS2: 59, trnL-ef: 173, trnL-gh: 206) are new records for a given species and 407 (22.9%, ITS2: 187, trnL-ef: 206, trnL-gh: 14) provide additional information (e.g. different bp length, intraspecific genetic variability); the remaining 682 sequences (38.3%) are equal (100% identity) to sequences already publicly available for the identified species. Overall, this dataset represents a significant contribution to the genetic knowledge of vascular plants represented in public libraries and, specifically, within the scope of the Agrivole project, will be essential for studying the feeding ecology of small mammals and understanding their ecological interactions, including competition between species for food resources. This will provide insights into the responses of small mammal communities in the Trás-os-Montes Region to agro-ecosystem structure and agricultural practices and how this may affect the potential for pest outbreaks in orchards (i.e. olive groves) or the resilience of species of conservation concern in these habitats. Furthermore, this database is essential for future long-term monitoring studies of the plant communities in the study area and will assist the government authorities in the sustainable management of the National Parks (
The InBIO Barcoding Initiative Database: contribution to the knowledge of DNA barcodes of the vascular plants of north-eastern Portugal.
Joana Paupério (project coordinator), João Queirós (junior researcher of project), Miguel Porto (plant specialist), Paulo Célio Alves (senior researcher in wildlife conservation and management), Pedro Beja (senior researcher in wildlife ecology), Ricardo Pita (project co-coordinator), Hélia Vale-Gonçalves (project technician), Catarina J. Pinho (project technician), Rodrigo Silva (Bachelor student in biology).
The region of Trás-os-Montes is located in the mountainous northeast of Portugal, comprising the Districts of Vila Real and Bragança (Fig.
Vascular plants were collected from traditionally managed olive grove agroecosystems and surrounding areas in north-eastern Portugal. Each distinct vascular plant species within each patch was collected, photographed and then mounted on herbarium sheets in situ. These were then morphologically identified by experts and a portion of the leaf tissue was used to obtain DNA barcodes.
Trás-os-Montes Region, Portugal
The sampling protocol was designed to cover plants in areas where small mammals, particularly voles, were expected to occur, as defined under the Agrivole project (
Collected samples were classified and revised by experts in vascular plant identification. The obtained DNA barcode sequences were compared against the GenBank and BOLD databases and the top hits were examined to identify potential PCR / sequencing errors, contaminations or misidentifications.
1. Plant sampling
Vascular plants were collected in the northeast of Portugal in August 2018, May–June 2019 and June 2020, from traditionally managed olive grove agroecosystems and their surrounding areas. A leaf tissue sample was collected for DNA extraction.
2. Taxonomic identification
The morphological identification of the specimens was carried out, after herborisation, under a stereomicroscope, using mainly the dichotomous keys of Flora Iberica (
3. DNA extraction
DNA was extracted for each sample using 20 mg of dry leaf material weighed into a 2 ml tube. To disrupt the thick plant cell walls, zirconia beads were added to the tube and placed on a mill for a minimum of 10 minutes. Lysis was performed by adding lysis buffer (2% SDS, 2% PVP 40, 250 mM NaCl, 200 mM Tris HCl, 5 mM EDTA, pH 8) and 10 µl of proteinase K (1 mg/ml) to the samples, followed by incubation at 63°C for 30 minutes. RNA was removed by adding 10 µl of RNAase A (10 mg/ml) followed by incubation at 37°C for 20 minutes. DNA was precipitated by adding 150 µl of potassium formate (KCOOH) (3 M), mixed by inversion of tubes, followed by incubation on ice for 25 minutes. A stepwise centrifugation was performed at increasing speed: 1 minute at 1000 rpm, at 2000 rpm, at 4000 rpm, at 8000 rpm and 10 minutes at 11000 rpm, followed by DNA binding using 550 µl of clear supernatant and 825 µl binding buffer (2 M guanidine hydrochloride in 95% ethanol). A maximum of 700 µl of binding mixture was added to a column, followed by centrifugation at 11000 rpm for 1 minute. This procedure was repeated for the remainder of the binding mixture. The column membranes were then washed twice with 80% ethanol (1 minute at 11000 rpm). Finally, two DNA elutions were performed by adding 50 µl of binding buffer (65°C 10 mM Tris-152 HCl, pH 8.3) to the column and centrifuging at 12000 rpm for 1 minute.
4. PCR amplification and library preparation
Polymerase chain reaction (PCR) amplification was performed focusing on two regions, the internal transcribed spacer of nuclear ribosomal DNA (ITS2), using the primer pair UniPlantF/UniPlantR (187–387 bp;
PCR reactions were performed by mixing 5 µl QIAGEN Multiplex PCR Master Mix, 3.4 µl ultrapure water, 0.3 µl of each primer and 1 µl DNA extract. Cycling conditions consisted of an initial denaturation at 95ºC for 15 min, followed by 40 cycles of denaturation at 95ºC for 30 s, annealing at 54ºC for 30 s for ITS2 and 55ºC for 1 min for both trnL primers and extension at 72ºC for 30 s, with a final extension cycle at 72ºC for 10 min. A touchdown method was also used for some samples that initially failed amplification for ITS2 to increase the specificity of the PCR product. This was done using the cycling conditions previously presented, changing the annealing for 5 cycles with an initial temperature of 56ºC, reduced by 0.5°C for each cycle. Amplification success was visually verified by electrophoreses in a 2% stained gel agarose using 2 µl of PCR product.
A second PCR was performed to incorporate the Illumina sequencing adapters and P5 and P7 indices using 5 µl KAPA HiFi HotStart ReadyMix, 1 µl of index mix, 2 µl of ultrapure water and 2 µl of the previous PCR products diluted 1:10. Cycling condition consisted of an initial denaturation at 95°C for 3 minutes, followed by 10 cycles at 95°C for 30 seconds, at 55°C for 30 seconds and 72°C for 30 seconds, with a final extension at 72°C for 5 minutes. Indexing success was assessed by comparing the initial PCR product with the indexed product by electrophoresis. These were cleaned using 1.2 × AMPure® XP beads, quantified using Epoch, diluted to 15 nM and pooled by marker. The three libraries were then quantified by qPCR and pooled to obtain a final library at 4 nM. The final library was sequenced on an Illumina MiSeq System, using a MiSeq V2 500-cycle reagent kit and considering a coverage of approximately 5,000 paired-end reads per sample and marker.
5. Bioinformatic analysis
The obtained sequences were bioinformatically processed using the OBITools software (
6. Data Analysis
To estimate the representativeness of the species collected in this study compared to the expected total diversity known for the Trás-os-Montes Region, we used as the reference dataset the list of plants described by family on the Flora-On website (
To assess the contribution of this study as a source of new genetic resources (DNA barcodes), the DNA sequences generated were compared to the NCBI Nucleotide Database using the BLAST+ software (
Sequences that did not match the morphological identification, at least at the family level, were considered errors and discarded. Sequences that matched with 100% coverage and identity to a different species within the same genus or family were considered new records for that species. Sequences were classified as completely new records for online databases if they showed identity percentages and query coverage below 90%. Sequences that correctly matched the morphological identification at the species level, but had a coverage of 80%–100% and/or an identity percentage between 98%–100%, were classified as new information records for that species.
The study was carried out in the districts of Bragança and Vila Real, located in northeast of Portugal. In particular, vascular plants were collected from agroecosystems.
-7.494 and -6.614 Latitude; 41.832 and 41.336 Longitude.
A total of 659 specimens were collected, of which 647 were morphologically identified to the most specific taxon (species, subspecies or variety). This dataset includes 448 taxa belonging to 69 families (Suppl. materials
Rank | Scientific Name | Common Name |
---|---|---|
kingdom | Plantae | Plants |
phylum | Tracheophyta | |
class | Liliopsida | |
order | Asparagales | |
family | Amaryllidaceae | |
family | Asparagaceae | |
family | Orchidaceae | |
order | Dioscoreales | |
family | Dioscoreaceae | |
order | Poales | |
family | Cyperaceae | |
family | Juncaceae | |
family | Poaceae | |
class | Lycopodiopsida | |
order | Isoetales | |
family | Isoetaceae | |
class | Magnoliopsida | |
order | Apiales | |
family | Apiaceae | |
order | Asterales | |
family | Asteraceae | |
family | Campanulaceae | |
order | Boraginales | |
family | Boraginaceae | |
order | Brassicales | |
family | Brassicaceae | |
family | Resedaceae | |
order | Caryophyllales | |
family | Amaranthaceae | |
family | Caryophyllaceae | |
family | Montiaceae | |
family | Plumbaginaceae | |
family | Polygonaceae | |
family | Portulacaceae | |
order | Cucurbitales | |
family | Cucurbitaceae | |
order | Dipsacales | |
family | Caprifoliaceae | |
order | Ericales | |
family | Ericaceae | |
family | Primulaceae | |
order | Fabales | |
family | Fabaceae | |
family | Polygalaceae | |
order | Fagales | |
family | Fagaceae | |
family | Juglandaceae | |
order | Gentianales | |
family | Gentianaceae | |
family | Rubiaceae | |
order | Geraniales | |
family | Geraniaceae | |
order | Lamiales | |
family | Lamiaceae | |
family | Oleaceae | |
family | Orobanchaceae | |
family | Plantaginaceae | |
family | Scrophulariaceae | |
family | Verbenaceae | |
order | Malpighiales | |
family | Euphorbiaceae | |
family | Hypericaceae | |
family | Linaceae | |
family | Salicaceae | |
family | Violaceae | |
order | Malvales | |
family | Cistaceae | |
family | Cytinaceae | |
family | Malvaceae | |
family | Thymelaeaceae | |
order | Myrtales | |
family | Lythraceae | |
family | Onagraceae | |
order | Piperales | |
family | Aristolochiaceae | |
order | Ranunculales | |
family | Papaveraceae | |
family | Ranunculaceae | |
order | Rosales | |
family | Cannabaceae | |
family | Rosaceae | |
family | Urticaceae | |
order | Santalales | |
family | Santalaceae | |
order | Sapindales | |
family | Anacardiaceae | |
family | Rutaceae | |
order | Saxifragales | |
family | Crassulaceae | |
family | Paeoniaceae | |
family | Saxifragaceae | |
order | Solanales | |
family | Convolvulaceae | |
family | Solanaceae | |
order | Zygophyllales | |
family | Zygophyllaceae | |
class | Pinopsida | |
family | Cupressaceae | |
class | Polypodiopsida | |
order | Polypodiales | |
family | Aspleniaceae | |
family | Dennstaedtiaceae | |
class | Psilotopsida | |
order | Ophioglossales | |
family | Ophioglossaceae | |
order | Equisetales | |
family | Equisetaceae | |
family | Pteridaceae |
Plant samples were collected in August 2018, May–June 2019 and June 2020.
The data underpinning the analysis reported in this paper are deposited at BOLD, the Barcode of Life Data System, with the dataset name DS-IBPLT and GBIF, the Global Biodiversity Information Facility, at: https://doi.org/10.15468/exuufa.
Column label | Column description |
---|---|
materialSampleID | Unique identifier for the sample. |
recordNumber | Identifier for the sample being sequenced in the IBI catalogue number at Cibio-InBIO, Porto University. Same as fieldNumber. |
catalogNumber | Identifier for the specimen deposited in the Museum of the University of Porto (MHNC-UP). |
institutionCode | The full name of the institution that has physical possession of the DNA samples. |
occurrenceID | Global unique identifier for that sample. |
kingdom | Kingdom name. |
phylum | Phylum name. |
class | Class name. |
order | Order name. |
subfamily | Subfamily name. |
family | Family name. |
genus | Genus name. |
subpecies | Species nameSubpecies name. |
scientificNameAuthorship | The authorship information for the sample. |
identificationRemarks | Comments or notes about the taxonomic identification. |
identifiedBy | Full name of the individuals who assigned the specimen to a taxonomic group. |
typeStatus | Status of the specimen in an accessioning process. |
recordedBy | The full names of the individuals or team responsible for collecting the sample in the field. |
eventDate | Date of the sample collection. |
countryCode | The full, unabbreviated name of the country where the specimens was collectedCode of the country where the specimens was collected. |
lat | The geographical latitude (in decimal degrees) of the geographic centre of a location. |
lon | The geographical longitude (in decimal degrees) of the geographic centre of a location. |
county | The full, unabbreviated name of the county where the organism was collected. |
municipality | The full, unabbreviated name of the municipality ("Concelho" in Portugal) where the specimen was collected. |
verbatimLocality | The original textual description of the sampled location. |
dynamicProperties | DNA barcoded sequences. |
measurementDeterminedBy | The full name of the institution where DNA samples are stored. |
datasetName | BOLD dataset code. |
We would like to thank all those involved in the Agrivole project and the InBIO Barcoding Initiative who helped with this work, especially Sónia Ferreira for her help with the data submission. This work was funded by the European Regional Development Fund (ERDF) through “Programa Operacional de Factores de Competitividade—COMPETE” and by national funds through FCT—Portuguese Foundation for Science and Technology, IP, under the scope of the project AGRIVOLE (PTDC/BIA- ECO/31728/2017). MP was supported by national funds through FCT, IP in the scope of Norma Transitória—grant 547 no. DL57/2016/CP1440/CT0017 (https://doi.org/10.54499/DL57/2016/CP1440/CT0017).
The file includes information about all records in BOLD for the IBPLT DNA barcoding of Portuguese plants database. It contains collection and identification data. The data are as downloaded from BOLD, without further processing.
The file includes information about all records in BOLD for the IBPLT DNA barcoding of Portuguese plants. It contains collection and identification data. The data are downloaded from GBIF (https://doi.org/10.15468/exuufa), without further processing.
List of species DNA barcoded in this project, including their original collection code, BOLD and GenBank accession codes for the three markers used (ITS2, trnl-ef and trnl-gh). In the trnl-gh column, (*) indicates sequences that did not meet the GenBank threshold of > 50 bp sequence length. These sequences are provided in Suppl. material 6.
ITS2 sequences in fasta format. Each sequence is identified by the original collection code, BOLD ProcessID and GenBank/INSDC accession number, separated by a vertical bar.
trnL-ef sequences in fasta format. Each sequence is identified by the original collection code, BOLD ProcessID and GenBank/INSDC accession number, separated by a vertical bar.
trnL-gh sequences in fasta format. Each sequence is identified by the original collection code, BOLD ProcessID and GenBank/INSDC accession number, separated by a vertical bar. (*) indicates sequences that did not meet the GenBank threshold of > 50 bp sequence length and therefore do not present accession numbers.