Biodiversity Data Journal :
Research Article
|
Corresponding author: Darren F Ward (wardda@landcareresearch.co.nz)
Academic editor: Silas Bossert
Received: 10 Jul 2024 | Accepted: 23 Sep 2024 | Published: 26 Sep 2024
© 2024 Darren Ward
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ward D (2024) Building a DNA barcode reference collection of Hymenoptera in New Zealand. Biodiversity Data Journal 12: e131701. https://doi.org/10.3897/BDJ.12.e131701
|
|
Molecular tools used for the identification of species are heavily reliant on reference DNA sequences and taxonomic annotation. Despite this, there are large gaps in the availability of DNA sequences for many taxonomic groups and for different parts of the globe. Here, a DNA barcode library for the Hymenoptera of New Zealand is presented, based on the COI region for 3,145 sequences assigned to 837 BINs and which represent 231 genera and 236 species. This study provides a DNA barcode for approximately 25% of species and 42% of genera of Hymenoptera in New Zealand. However, when combined with sequences previously deposited in BOLD (a further 170 genera), DNA barcodes are available for 73% of New Zealand Hymenopteran genera. To further increase coverage, future efforts need to focus predominantly on taxa from seven families (Encyrtidae, Pteromalidae s.l., Mymaridae, Eulophidae, Diapriidae, Braconidae and Platygastridae). This database facilitates DNA-based identification of taxa for use in both taxonomic revisions and biodiversity monitoring.
eDNA, molecular, monitoring, sequences, taxonomy
Molecular tools are now an indispensable part of biodiversity science and management for understanding biodiversity and ecology of species and communities, detecting threatened or invasive species, assessing environmental change and for taxonomy and systematics (
Molecular databases, such as GenBank [www.ncbi.nlm.nih.gov/genbank/] and the Barcode of Life (BOLD) database [www.boldsystems.org/] serve as repositories of reference DNA barcodes derived from specimens identified by taxonomists, against which DNA sequences can be compared and assigned to known taxa (
Hymenoptera (ants, bees, wasps and sawflies) are one of the globally megadiverse orders of insects. They include some damaging pest species (e.g. social ants (
The current classification of Hymenoptera in New Zealand recognises 947 species in 546 genera from 52 families (
Sampling and Specimen Records
Overall the DNA barcodes have accumulated since 2010 with specimens coming from three sources:
1. Field-based sampling from 2010 - 2023, using Malaise traps and sweep nets. This sampling has predominantly occurred within five regions: Auckland, Central Otago, Dunedin, Fiordland and the West Coast. This sampling was undertaken to obtain ‘fresh’ specimens specifically for DNA sequencing and this approach contributed 57% of all sequences.
2. Existing specimens in the New Zealand Arthropod Collection (NZAC) from either pinned or ethanol-based material (
3. Specimens from queries sent to the NZAC for identification since 2010. This work was opportunistic, but helped to increase the taxonomic and geographic coverage of sequences and contributed 8% of sequences.
Specimen Identification
Specimens were morphologically examined and identified by comparing them to previously identified specimens in the NZAC, using taxonomic keys and expert knowledge. Sometimes specimens were identified before DNA processing and sequencing, which was typical of those specimens that came from the NZAC or from identification queries. However, the identification of many specimens was confirmed post-DNA extraction, based on building taxon trees in BOLD and then physical examination of the specimen (by the author). The BLAST function (Basic Local Alignment Search Tool) was not used as an identification tool because, whilst the DNA reference collection was being built, a BLAST provided either no taxonomic name, or an ambiguous or unlikely name. This is typical of databases where there is incomplete coverage, particularly of diverse and highly regionalised taxa (such as Hymenoptera in New Zealand). Taxon coverage and gaps were compared to an online checklist of taxa present in New Zealand (
DNA Processing and Data Accessibility
From each specimen, one tissue sample (a leg, sometimes two legs depending on specimen size) was removed and stored in 95% ethanol for DNA extraction. Specimens were either processed at the Canadian Center for DNA Barcoding (www.ccdb.ca) or the Ecogene facility at Landcare Research, based on the COI-5P marker (https://www.landcareresearch.co.nz/partner-with-us/ecogene-dna-based-diagnostics/). Primers used were: LepF1/LepR1, MLepF1/LepR1 and LCO1490/HCO2198. All physical specimens are held in the NZAC and their details (e.g. collecting locality, dates etc.) are available through GBIF [www.gbif.org/] and specimen details, sequences and metadata are available in the laboratory information system in BOLD [www.boldsystems.org/] and more broadly (
A total of 3145 sequences were obtained and assigned to 837 BINs, of which identifications were made for 236 named species and 231 genera (Suppl. materials
Numbers of genera with (and without) a DNA barcode for each family. Table is organised alphabetically by family. Columns: #Genera in NZ (see https://en.wikipedia.org/wiki/Hymenoptera_in_New_Zealand); #Genera sequenced (see Supplementary material) from this study; Total #Genera available includes the combined information from this study and additional searches in BOLD.
Family | #Genera in NZ | #Genera sequenced (this study) | Total #Genera available (this study and BOLD) | %Coverage | #Genera to obtain |
Agaonidae | 1 | 1 | 1 | 100% | 0 |
Aphelinidae | 12 | 2 | 10 | 83% | 2 |
Apidae | 2 | 2 | 2 | 100% | 0 |
Bembicidae | 1 | 1 | 1 | 100% | 0 |
Bethylidae | 13 | 10 | 13 | 100% | 0 |
Braconidae | 78 | 55 | 66 | 85% | 12 |
Ceraphronidae | 2 | 1 | 2 | 100% | 0 |
Chalcididae | 3 | 0 | 3 | 100% | 0 |
Colletidae | 5 | 5 | 5 | 100% | 0 |
Crabronidae | 4 | 4 | 4 | 100% | 0 |
Cynipidae | 2 | 1 | 2 | 100% | 0 |
Diapriidae | 25 | 4 | 10 | 40% | 15 |
Dryinidae | 4 | 4 | 4 | 100% | 0 |
Embolemidae | 1 | 0 | 1 | 100% | 0 |
Encyrtidae | 39 | 4 | 14 | 36% | 25 |
Eulophidae | 50 | 10 | 34 | 68% | 16 |
Eupelmidae | 3 | 1 | 2 | 67% | 1 |
Eurytomidae | 5 | 2 | 3 | 60% | 2 |
Figitidae | 10 | 8 | 10 | 100% | 0 |
Formicidae | 27 | 24 | 27 | 100% | 0 |
Gasteruptiidae | 2 | 2 | 2 | 100% | 0 |
Halictidae | 2 | 1 | 2 | 100% | 0 |
Ibaliidae | 1 | 0 | 1 | 100% | 0 |
Ichneumonidae | 63 | 51 | 62 | 98% | 1 |
Maamingidae | 1 | 1 | 1 | 100% | 0 |
Megachilidae | 3 | 1 | 3 | 100% | 0 |
Megaspilidae | 4 | 2 | 3 | 75% | 1 |
Megastigmidae | 1 | 0 | 1 | 100% | 0 |
Mutillidae | 1 | 1 | 1 | 100% | 0 |
Mymaridae | 40 | 2 | 20 | 50% | 20 |
Mymarommatidae | 2 | 0 | 0 | 0% | 2 |
Orussidae | 1 | 0 | 1 | 100% | 0 |
Pemphredonidae | 1 | 1 | 1 | 100% | 0 |
Pergidae | 1 | 1 | 1 | 100% | 0 |
Perilampidae | 1 | 0 | 1 | 100% | 0 |
Platygastridae | 21 | 1 | 12 | 57% | 9 |
Pompilidae | 4 | 4 | 4 | 100% | 0 |
Proctotrupidae | 3 | 2 | 3 | 100% | 0 |
Pteromalidae | 53 | 7 | 30 | 57% | 23 |
Rotoitidae | 1 | 0 | 0 | 0% | 1 |
Scelionidae | 22 | 7 | 15 | 74% | 7 |
Scolebythidae | 1 | 0 | 0 | 0% | 1 |
Scoliidae | 1 | 1 | 1 | 100% | 0 |
Signiphoridae | 2 | 0 | 2 | 100% | 0 |
Siricidae | 1 | 0 | 1 | 100% | 0 |
Sparasionidae | 1 | 1 | 1 | 100% | 0 |
Sphecidae | 1 | 0 | 1 | 100% | 0 |
Tenthredinidae | 4 | 3 | 4 | 100% | 0 |
Torymidae | 5 | 0 | 3 | 60% | 2 |
Trichogrammatidae | 11 | 0 | 3 | 27% | 8 |
Vespidae | 3 | 3 | 3 | 100% | 0 |
Xiphydriidae | 1 | 1 | 1 | 100% | 0 |
It is more challenging to obtain a “% coverage” at the species level. Approximately, 25% of named species are represented by a sequence (i.e. 236 named species in Supplementary Material 2 from an overall checklist of 947 species). Amongst the 871 BINS (Suppl. material
Coverage is higher for families with fewer genera and for groups that are well curated and revised (e.g. ‘Symphyta’ and Aculeata) or are part of current taxonomic projects (Braconidae, Ichneumonidae). However, a total of 148 genera still do not have a DNA barcode. The majority of these ‘gaps’ occur within seven families (Encyrtidae, Pteromalidae s.l., Mymaridae, Eulophidae, Diapriidae, Braconidae and Platygastridae).
It is well known that sequence databases exhibit notable taxonomic gaps in coverage (
DNA barcode reference databases linked to voucher specimens create new opportunities for a range of future activities, including high-throughput identification and taxonomic revisions (
The development of DNA reference databases was recently highlighted by the community survey as requiring the most need (see fig. 2 in
Several authors have suggested that addressing the challenge of ‘taxonomic gaps’ is urgently needed and requires a collaboration between ecologists, geneticists and taxonomic experts (
Many thanks to all those at BOLD for their fantastic and prompt assistance over many years. Thanks also for the helpful suggestions of two reveiwers and to the numerous experts who have helped identify taxa, especially: S. Belokobylskij (Zoological Institute Russian Academy of Sciences), J. Berry (Ministry for Primary Industries), B. Donovan (Department of Science and Industrial Research), J. Fernandez-Triana (Canadian National Collection of Insects), I. Gauld (Natural History Museum, London), A. Khalaim (Zoological Institute Russian Academy of Sciences), L. Masner (Canadian National Collection of Insects), J. Noyes (Natural History Museum, London), D. Quicke (Chulalongkorn University) and E. Valentine (Department of Science and Industrial Research).
Excel data file of information from BOLD-generated download of the metadata associated with DNA sequences, consisting of: lab information, sequence lengths, voucher codes, institution storing, taxonomy, specimen details and collection data (e.g. locality, dates etc.).
Nearest-neighbour taxonomic tree, based on COI sequences using the Kimura 2 Parameter Distance Model and aligned with the BOLD Aligner (amino acid based) with sequences only over 400 base pairs and without stop codons, contaminants or flagged as misidentifications or errors.