Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Anaïs Marquisseau (anais.marquisseau@inrae.fr), Magalie Pichon (magalie.pichon@inrae.fr)
Academic editor: Benoît Geslin
Received: 20 Sep 2024 | Accepted: 12 Dec 2024 | Published: 07 Jan 2025
© 2025 Anaïs Marquisseau, Kamila Canale-Tabet, Emmanuelle Labarthe, Géraldine Pascal, Christophe Klopp, André Pornon, Nathalie Escaravage, Rémi Rudelle, Alain Vignal, Annie Ouin, Mélodie Ollivier, Magalie Pichon
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Marquisseau A, Canale-Tabet K, Labarthe E, Pascal G, Klopp C, Pornon A, Escaravage N, Rudelle R, Vignal A, Ouin A, Ollivier M, Pichon M (2025) Building a reliable 16S mini-barcode library of wild bees from Occitania, south-west of France. Biodiversity Data Journal 13: e137540. https://doi.org/10.3897/BDJ.12.e137540
|
|
DNA barcoding and metabarcoding are now powerful tools for studying biodiversity and especially the accurate identification of large sample collections belonging to diverse taxonomic groups. Their success depends largely on the taxonomic resolution of the DNA sequences used as barcodes and on the reliability of the reference databases. For wild bees, the barcode sequences coverage is consistently growing in volume, but some incorrect species annotations need to be cared for. The COI (Cytochrome Oxydase subunit 1) gene, the most used in barcoding/metabarcoding of arthropods, suffers from primer bias and difficulties for covering all wild bee species using the classical Folmer primers.
We present here a curated database for a 250 bp mini-barcode region of the 16S rRNA gene, suitable for low-cost metabarcoding wild bees in applications, such as eDNA analysis or for sequencing ancient or degraded DNA. Sequenced specimens were captured in Occitania (south-west of France) and morphologically identified by entomologists, with a total of 530 individuals belonging to 171 species and 19 genera. A customised workflow including distance-tree inferences and a second round of entomologist observations, when necessary, was used for the validation of 348 mini-barcodes covering 148 species. Amongst them, 93 species did not have any 16S reference barcode available before our contribution. This high-quality reference library data are freely available to the scientific community, with the aim of facilitating future large-scale characterisation of wild bee communities in a context of pollinators' decline.
wild bees, Apoidea, Anthophila, 16S rRNA, reference database, DNA barcoding
Worldwide, pollinators have become the focus of particular attention as populations decline drastically (
Traditionally, arthropod identification, including wild bees, was based on the examination of morphological characters and the time-consuming detection of subtle morphological differences between species requires trained taxonomists. Unfortunately, the lack of policy commitment to training new experts has led to an increasingly intense shortage of specialists, a situation commonly referred to as the taxonomic impediment (
As a consequence, the COI barcode has become the main marker used for cataloguing the genetic diversity of Apoidea Anthophila in many countries worldwide: in Canada (
Since over two decades, the 16S locus has already been used to infer the phylogeny of Hymenoptera including bees (
Targeting a short barcode gene region (hereafter referred to as mini-barcode) is particularly interesting for approaches requiring to overcome DNA degradation, while preserving a high level of taxonomic resolution (
In this study, our main objectives were to evaluate the 16S mini-barcode potential (
Collection description
The 530 specimens used in this study originated from three sources: 1) 412 from the UMR DYNAFOR collection stored at INP-ENSAT (
1. Sample collection
For the UMR DYNAFOR collection, three coloured pan traps (blue, white and yellow) were set in the grassy strip boarding the crop. Each pan trap contained 250 ml of water with Teepol (3 drops/l). After 3 days of exposure, the insects were filtered and placed in ethanol (EtOH) 70% until identification. A panel of 61 specimens was captured with nets and euthanised in ethyl acetate vapour (Suppl. material
All of the specimens were morphologically identified using mostly the Insecta Fauna Helvetica reference (
2. Sequencing and processing
DNA was extracted from a portion or an entire leg of dried specimens using the Chelex method (see
MiSeq sequencing and processing
For the set of 275 specimens processed with MiSeq sequencing, two microlitres of DNA were used as template for PCR. 16S primers ins16S_1R/ins16S_1F (R: TRRGACGAGAAGACCCTATA; F: TCTTAATCCAACATCGAGGTC,
Sanger sequencing and processing
DNA barcoding using Sanger sequencing technology was performed on 255 specimens. Specific primers were used for each genus. All primer sequences and PCR conditions are given in Fig.
Amongst the 171 species included in our study, 43 were represented by a single specimen, 25 by two specimens, 51 by three specimens and 52 by four to ten specimens, corresponding to a total of 530 specimens (Fig.
16S mini-barcode library workflow, from sampling to validation. There are four main steps represented by different colours in the left panel, from top to bottom. The middle part of the figure represent the workflow. The right panel indicates the final barcode status.
For sequence validation, we used : sequence assignation by BLASTn (
The global success rate after MiSeq sequencing was very high, reaching 99%, with only a single specimen failing. The sequencing success with the Sanger technology was lower due to negative PCR or unreadable chromatograms. Indeed, no sequence could be obtained for 66 specimens. However, as replicate samples were included for most species, only seven species were excluded (Suppl. material
3. Sequence validation
The 463 sequences corresponding to the remaining 164 species were searched in GenBank (nt database) using BLASTn (
In total 348 sequenced samples corresponding to 148 unique species were successfully analysed and validated and 97 species were represented by at least two specimens (Fig.
The wild bees presented in this study were collected from the French region of Occitanie (Fig.
Specimen records are reported for the 348 sequences (148 species) confirmed with the above workflow. Fig.
Taxonomic coverage of the 16S mini-barcode library compared to existing sequences in GenBank. nb seq: Number of sequences ; nb spe: Number of species. As a point of reference, the sequences available on the public database GenBank before the project start are given in the second column (only for species recorded in the area of collection ZA PYGAR). The last column indicates the number of new sequences added in GenBank and BOLD. Number of species corresponding to specimen records are indicated in brackets.
For 55 of the species included in our dataset, 16S partial or full-length sequences were already available in GenBank (Fig.
CC BY 4.0
The list of the 530 specimens (171 species) with complementary information such as their BOLD IDs, process IDs, GenBank IDs (only for sequences > 200 bp), taxonomy, identifiers, gps location for UMR DYNAFOR collection, sequencing method, barcode status (failed, contaminated or confirmed replicate/single) is contained in the dataset. It covers five families, 19 genera and 171 species. After sequencing and validation barcode steps, 348 sequences corresponding to 148 species and 17 genera were selected. Suppl. materials
Column label | Column description |
---|---|
Sample_ID | Unique BOLD identifier for the specimen. |
Process_ID | Unique BOLD identifier for the barcode. |
Accession_NCBI | Unique GenBank identifier for the barcode (Accession number). |
Museum_ID | Unique collection identifier for the specimen. |
Collection_code | Identifier for the collection: Dynafor, RIEUPEYROUX or CRBE. |
Institution_storing | Institution where specimens are physically stored: ENSAT, Rudélide Expertise Muséologie REM or CNRS. |
Phylum | Phylum name |
Class | Class name |
Order | Order name |
Family | Family name |
Subfamily | Subfamily name |
Genus | Genus name |
Species | Species name |
Subspecies | Subspecies name |
Identifier | Name of the individual who identified the specimen morphologically. |
Identifier_Email | Email of the identifier. |
Identification_Method | All specimens were morphologically identified. |
Sex | The sex of the specimen: F for female, M for male. |
Specimen's caste | Extra information about the specimen's caste: W if the specimen is a worker (empty otherwise). |
Life_stage | Life stage of the sampled specimen. All specimens were adults. |
Tissue_descriptor | Type of tissue analysed: LEG. |
Collectors | Names of the individuals who collected the specimen in the field. |
Collection_Date | Exact date during which the specimen was collected. For CRBE specimens, only the collection year is available. |
Country | Name of the country in which the specimen was collected. All specimens were collected in France. |
State | Name of the state (French: Région) in which the specimen was collected. All the specimens, except nine, were collected in Occitanie. |
Region | Name of the region (French: Département) in which the specimen was collected. |
Sector | Name of the sector or city, in which the specimen was collected. |
Exact_Site | A brief description of the site in which the specimen was collected. |
Latitude | The geographic latitude (in decimal degrees) of the site in which the specimen was collected. For CRBE or Rudélide specimens, only an approximate latitude is available, corresponding to the latitude of the municipality rather than the exact collection point. |
Longitude | The geographic longitude (in decimal degrees) of the site in which the specimen was collected. For CRBE or Rudélide specimens, only an approximate longitude is available, corresponding to the longitude of the municipality rather than the exact collection point. |
Sampling_protocol | The sampling method used to capture the specimen: NET or PAN TRAP. |
Sequencing_method | The method used to sequence the specimen: SANGER or MISEQ. |
Barcode_status | The status of the 16S barcode for the specimen: Confirmed_single, Confirmed_replicate, Contaminated or Fail. |
The within-genus global sequencing success including MiSeq and Sanger technologies varies from 75% to 100%, except for Sphecodes (33%) (Fig.
Examination of the general normalised divergence histogram performed with BOLD analyses toolkit on all species (replicates) indicates the existence of a DNA barcoding gap (maximal intraspecific distance > minimal interspecific distance), allowing reliable molecular identification of specimens (Fig.
Distribution of intraspecific and interspecific genetic distances per genus. The global normalised distance distribution for all specimens is shown at bottom right corner. Blue arrows indicate species that show an intraspecific distance > 1%. Red arrows indicate group of species that show an interspecific distance < 1%.
Genetic distance analyses per family and genus
For the Andrena genus, which has been reported difficult to barcode with the COI Folmer primers (
Bombus: The distance tree inferred from the 16S mini-barcodes of the 23 Bombus species (73 specimens) reveals genetic divergence that is consistent with the known subgenus classification (
Eucera: In the Apidae family, the 16S mini-barcodes are not discriminant for two species belonging to the Eucera genus: Eucera nigrifacies and Eucera nigrescens, whereas it efficiently delineates the six others species, especially Eucera longicornis which was confused in the past with Eucera nigrescens (
Nomada: Regarding the Nomada genus, recently reexamined by
Ceratina: The molecular phylogeny of these small carpenter bees has been recently achieved by
Xylocopa: The five specimens of Xylocopa from our dataset correspond to the three species: Xylocopa valga, Xylocopa iris and Xylocopa violacea exhibit 0.46% maximum intraspecific divergence and 7.19% minimum interspecific divergence (Suppl. materials
In the present study, five species belonging to Hylaeus (
Osmia: As Apis mellifera or Bombus, Osmia are commercially reared for pollination services. A complete phylogeny of the Palearctic Osmiine bee is available on the website of
Lasioglossum: Molecular phylogeny of Lasioglossum is poorly documented (
Halictus: In the Halictus simplex group, Halictus simplex and Halictus langobardicus females are extremely difficult to distinguish morphologically. Unfortunately, the 16S mini-barcode did not allow for the discrimination of the two species, whereas Halictus compressus exhibited a 0.78% minimum interspecific divergence with the rest of the group (Suppl. material
The 250 bp 16S mini-barcode used in this study allows wild bee identification of all species, except two specimens of the Melecta and Anthidium genus. Integrative approaches coupling examination of distance trees, multiple alignment and comparison with morphological data allowed us: 1) to provide 204 new 16S mini-barcodes for wild bees belonging to 93 species verified by taxonomists; 2) to identify species complexes and 3) to delineate efficiently species when females were difficult to separate. This opens avenues for the 16S mini-barcode to be used as an efficient and reliable additional marker in the toolkit for anyone relying on molecular technologies for wild bees ecological studies.
Financial support of this study was provided by ZA-PYGAR 2019-2020. We would like to thank Mathilde Bouchard and Laurent Raison for collecting wild bees legs and Jérôme Willm for collecting wild bees legs and preparing the metadata table. Thanks to Thibault Leroy for critical reviewing of the manuscript. We are grateful to the genotoul bioinformatics platform Toulouse Occitanie (Bioinfo Genotoul, https://doi.org/10.15454/1.5572369328961167E12) for providing computing resources.
Figure design: AM
Specimen sampling, databasing: AO, RR, AP, NE
Writing of manuscript: MP
Bioanalysis and barcode validation: AM, MO, CK, GP, MP
Editing and comments to the manuscript: AM, MO, CK, GP, AV, AP, NE, KT, EL
Molecular experiments: EL, KT, MP
List of the 23 species eliminated during the barcode acquisition process.
Distances between specimens within species. To find minimum and maximum intraspecific distance for a specific species, filter the column "Species".
Distances between specimens belonging to different species within their genus. To find minimum and maximum interspecific distance between two species, filter the columns "species_1" and "species_2". To find minimum and maximum intragenus distance, filter the column "Genus".