Biodiversity Data Journal :
Research Article
|
Corresponding author:
Academic editor: Paulo Borges
Received: 17 Oct 2022 | Accepted: 08 Dec 2022 | Published: 24 Jan 2023
© 2023 André Schütte, Peter Stüben, Jonas Astrin
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Schütte A, Stüben PE, Astrin JJ (2023) Molecular Weevil Identification Project: A thoroughly curated barcode release of 1300 Western Palearctic weevil species (Coleoptera, Curculionoidea). Biodiversity Data Journal 11: e96438. https://doi.org/10.3897/BDJ.11.e96438
|
The Molecular Weevil Identification project (MWI) studies the systematics of Western Palearctic weevils (superfamily Curculionoidea) in an integrative taxonomic approach of DNA barcoding, morphology and ecology. This barcode release provides almost 3600 curated CO1 sequences linked to morphological vouchers in about 1300 weevil species. The dataset is presented in statistical distance tables and as a Neighbour-Joining tree. Bayesian Inference trees are computed for the subfamilies Cryptorhynchinae, Apioninae and Ceutorhynchinae. Altogether, 18 unresolved taxonomic issues are discussed. A new barcode primer set is presented. Finally, we establish group-specific genetic distances for many weevil genera to serve as a tool in species delineation. These values are statistically based on distances between "good species" and their congeners. With this morphologically calibrated approach, we could resolve most alpha-taxonomic questions within the MWI project.
DNA barcoding, integrative taxonomy, thresholds, Cryptorhynchinae, Curculionidae, Apionidae, Western Palearctic, Europe, Canary Islands
With 400,000 described species, beetles (Coleoptera) constitute the most diverse animal order (
The taxonomic impediment (
Genetic distances can be measured as a proportion of different nucleotide positions in percent. An important prerequisite for DNA barcoding is that interspecific genetic distances vary significantly for at least a large majority of cases from intraspecific genetic distances. This concept is often referred to as the barcoding gap (
Underlying morphological misidentifications pose a major problem to DNA barcoding datasets and reference collections. Unfortunately, specimen misidentification is common in literature, in collections and particularly widespread in public sequence databases (
The Molecular Weevil Identification project (MWI) presented here strives to avoid pitfalls that arise in DNA barcoding studies, when not backed up by an extensive voucher collection. MWI created a reference database of high-quality DNA barcodes from scratch. Almost 1300 Western Palearctic weevil species have been barcoded, based on rigorous vouchering routines and project criteria: DNA was extracted non-invasively from specimens, then mounted as morphological vouchers for the dry collection, accessible at a public natural history collection (Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig, Bonn, Germany). These morphological vouchers are accompanied by stored DNA extracts and tissue samples in a dedicated biobank at the same institute. The laboratory infrastructure used in MWI was that of the German Barcode of Life (GBOL) project (
In practice, most of these initial conspicuous molecular findings consisted of simple genetic distances that were considerably higher (in relation to other intraspecific comparisons) or lower (in relation to other interspecific comparisons) than expected. Previous research shows that genetic distance values very often coincide with species limits, but vary widely by taxon and geographic setting, for example, 2.7% for a species delineation threshold for North American birds (
Considerable discussion has gone into the topic of genetic thresholds as criteria for species delimitation (10x rule in
This study presents a validated, taxonomically thoroughly curated barcode release with almost 3600 sequences, the most extensive Western Palearctic weevil barcode dataset until now, covering ca. 1300 weevil species. Based on this data and considering different distribution patterns, we also give group-specific, statistically derived heuristic hints regarding which genetic distance values fall within the typical interspecific range. Cryptorhynchinae, Ceutorhynchinae and Apioninae are represented in our dataset with a high species coverage for the sampled region. Therefore, we specify such values only for genera of those three subfamilies, providing minimum and average p‑distance values. By sharing these data, we hope to accelerate specimen re-identification within the discussed taxa and aid in prefiltering future cases for thorough integrative alpha-taxonomic investigation.
We analysed 3573 mitochondrial CO1 sequences of the DNA barcoding region (
The geographic origin of the collected weevils is as follows: 2510 specimens (70% of the dataset) were collected throughout continental Europe including the Mediterranean islands; 889 specimens (25% of the dataset) were collected on the Macaronesian islands including the Canaries, Azores, Madeira Archipelago with Desertas Islands and Savage Islands (Ilhas Selvagens); 164 specimens (5% of the dataset) were collected in continental North Africa, mostly Morocco and Tunisia. Collecting locations of the sampled specimens are plotted on two ArcGIS map baselayers with GPS Visualizer, see Fig.
The most frequently collected Curculionidae subfamilies were: Cryptorhynchinae (1190 sequences, 278 species, subspecies not differentiated in the species count), Entiminae (576 sequences, 269 species), Ceutorhynchinae (537 sequences, 203 species), Curculioninae (356 sequences, 168 species) and Apioninae (349 sequences, 115 species). A complete overview of sequences per subfamily is shown in Fig.
Most specimens were collected directly into 96% non-denatured ethanol without killing agents. In some cases, we sequenced previously-collected dried specimens, usually not more than five years old. Field data, voucher numbers and GenBank accession numbers for all specimens are provided in Suppl. material
The laboratory routine for Cryptorhynchinae is described in
PCR primer sets. LCO1490-JJ & HCO2198-JJ (
Primer Name | 5'-3' Read Direction | Reference | Optimised for |
LCO1490-MWI | ACWAAYCATAARRAYATYGG | this study (new) | Apioninae & Ceutorhynchinae |
HCO2198-MWI | TADACTTCDGGRTGDCCRAARAATCA | this study (new) | |
LCO1490-JJ | CHACWAAYCATAAAGATATYGG |
|
Cryptorhynchinae |
HCO2198-JJ | AWACTTCVGGRTGVCCAAARAATCA |
|
|
LCO1490-JJ2 | CHACWAAYCAYAARGAYATYGG |
|
universal (arthropods) |
HCO2198-JJ2 | ANACTTCNGGRTGNCCAAARAATCA |
|
Thermal cycling was performed on GeneAmp PCR System 2700 instruments (Life Technologies, Carlsbad, USA) as follows: hot start Taq activation: 15 min at 95°C; first cycle set ("touch down" with 15 repeats): 35 s denaturation at 94°C, 90 s annealing at 55°C (−1°C/cycle) and 90 s extension at 72°C. Second cycle set (25 repeats): 35 s denaturation at 94°C, 90 s annealing at 40°C and 90 s extension at 72°C; final elongation: 10 min at 72°C. Amplicons were purified with the ExoSAP-IT kit (USB Corporation, Cleveland, Ohio) and sequenced bidirectionnally using the PCR primers (Table
Contig assembly and trimming of primer regions were performed in Geneious Pro 6.1.8 (
The dataset contains 3573 weevil barcodes, of which 3302 sequences cover the full barcode length (658 bp), 272 sequences are shorter. Two sequences (GU987885, MG229813) barely failed to reach the 500 bp minimum required by BOLD (
Alignment. DNA sequences were aligned with the Muscle (
Neighbour-Joining tree. The Neighbour-Joining (NJ,
Bayesian Inference. Phylogenetic trees, based on Bayesian Inference, were reconstructed for three sub-datasets:
1) MrBayes sub-dataset for Cryptorhynchinae + Cossoninae: 1311 sequences in total, 1190 sequences from Cryptorhynchinae, 120 additional sequences from Cossoninae plus one outgroup species (Anthribidae, GenBank FJ867818).
2) MrBayes sub-dataset for Apioninae + Nanophyinae + Attelabidae: 367 sequences in total, 349 sequences from Apioninae, 5 additional from Attelabidae, 12 additional from Nanophyinae plus one outgroup species (Cryptorhynchus lapathi, D-0354-lap, GenBank EU286523).
3) MrBayes sub-dataset for Ceutorhynchinae: 537 Ceutorhynchinae sequences plus one outgroup (Cryptorhynchus lapathi, D-0354-lap, GenBank EU286523).
Based on the Bayesian information criterion value (BIC,
The Perl script DiStats (
The description below is the shortened version.
Confidence groups. For DiStats analysis, only Cryptorhynchinae, Ceutorhynchinae and Apioninae species are taken into account (datasets of the best-sampled subfamilies). The sequences of each species are assigned to one of three confidence groups:
Only sequences / specimens from confidence groups 1 and 2 were used in DiStats statistics.
Distribution groups. The reference species ("good species)" were also assigned to one out of four geographical distribution groups (Table
Distribution group |
ISL (island) |
C1 (endemic) |
C2 (medium) |
C3 (large) |
Cryptorhynchinae |
island(s) |
up to 50 km |
50 to 500 km |
500 km and above |
Apioninae |
island(s) |
up to 50 km |
50 to 2,000 km |
2,000 km and above |
Ceutorhynchinae |
island(s) |
up to 50 km |
50 to 2,000 km |
2,000 km and above |
Interspecific distances per genus and distribution. We examined the p-distances from the reference species to its closest congeners. Only for the reference species, the distance values to each closest congener were used to create genus lists with minimum and average interspecific distance values per geographic distribution group. The closest congener can be another reference species or a taxon assigned to confidence group 2 (congener dataset). We never used the p-distances from taxa of confidence group 2; those were kept only to increase the amount of congeners in the DiStats dataset.
The programme 'Assemble Species by Automatic Partitioning' (ASAP,
There remain 18 apparent contradictions between morphological identification and molecular results, see NJ tree in Suppl. material
The NJ tree with the complete MWI dataset is shown in Suppl. material
Misidentified specimens are easy to spot in trees when embedded into a matrix of congeneric sequences – misidentified singletons are much more difficult to detect. Beyond misidentified specimens, conflicts can be caused by cryptic species or unresolved synonyms. Several of such inconsistencies have been clarified by taxonomists of the Curculio Institute over the last years, especially in Cryptorhynchinae, Ceutorhynchinae and Apioninae (see Introduction), thus delivering a cleaner picture for this barcode release.
The Bayesian consensus trees focusing on three groups within the dataset are provided in Suppl. material
The Bayesian posterior probabilities mostly show full or at least high (> 90) support in between species. The phylogenetic trees show substantially more polytomies than the phenetic NJ tree. Nevertheless, taxon placements with regard to the closest related species in the dataset mostly coincide between both methods or have marginal deviations. Thus, the Bayesian tree overall confirms the morphological species identifications and also the naming contradictions, based on unresolved taxonomic issues in the same way as the NJ tree.
The DiStats statistics are presented in Table
Summarised DiStats results for genera of Cryptorhynchinae. Numbers indicate uncorrected p-distance values (genetic distances) expressed in percent. Two values are given per genus and distribution range: 1. minimum distance to the closest congener within all species in the dataset and 2. average distance to the closest congener within all species in the dataset. Abbreviations: min. = minimum, dist. = distance.
Cryptorhynchinae |
island(s)/ archipel | island(s)/ archipel |
endemic (50 km) |
endemic (50 km) |
medium (50-500 km) |
medium (50-500 km) |
large (> 500 km) |
large (> 500 km) |
Distribution group | ISL | ISL | C1 | C1 | C2 | C2 | C3 | C3 |
Genus | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener |
Acalles | 8.4 | 9.0 | 3.2 | 7.0 | 6.2 | 10.6 | 7.1 | 11.7 |
Acallocrates | 13.0 | 13.3 | ||||||
Acallorneuma | 5.8 | 7.3 | 3.0 | 6.8 | 8.8 | 8.8 | ||
Aeoniacalles | 8.8 | 9.1 | ||||||
Calacalles | 3.2 | 6.0 | 14.3 | 14.3 | ||||
Canariacalles | 6.4 | 6.4 | ||||||
Caucasusacalles | 15.3 | 15.3 | ||||||
Coloracalles | 12.2 | 12.2 | ||||||
Dendroacalles | 3.7 | 7.3 | ||||||
Dichromacalles | 7.3 | 7.3 | 12.3 | 13.4 | 12.9 | 13.7 | ||
Echinodera | 3.3 | 9.6 | 5.8 | 11.6 | 5.8 | 11.4 | 6.1 | 10.8 |
Echiumacalles | 6.8 | 6.8 | ||||||
Elliptacalles | 7.1 | 7.1 | ||||||
Ficusacalles | 6.5 | 6.5 | ||||||
Kyklioacalles | 8.8 | 8.8 | 4.3 | 7.8 | 4.9 | 9.0 | 7.1 | 10.2 |
Lauriacalles | 10.2 | 10.2 | ||||||
Madeiracalles | 1.8 | 8.5 | ||||||
Montanacalles | 13.7 | 13.7 | ||||||
Onyxacalles | 7.6 | 8.8 | 4.0 | 7.1 | 4.0 | 7.1 | ||
Pseudodichromacalles | 6.4 | 7.7 | ||||||
Silvacalles | 0.9 | 3.8 | ||||||
Sonchiacalles | 8.3 | 8.8 | ||||||
Torneuma | 5.8 | 10.2 | 16.4 | 16.9 |
Summarised DiStats results for genera of Apioninae. Numbers indicate uncorrected p-distance values (genetic distances) expressed in percent. Two values are given per genus and distribution range: 1. minimum distance to the closest congener within all species in the dataset and 2. average distance to the closest congener within all species in the dataset. Abbreviations: min. = minimum, dist. = distance.
Apioninae | island(s)/ archipel | island(s)/ archipel |
endemic (50 km) |
endemic (50 km) |
medium (50-2000 km) |
medium (50-2000 km) |
large (> 2000 km) |
large (> 2000 km) |
Distribution group | ISL | ISL | C1 | C1 | C2 | C2 | C3 | C3 |
Genus | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener |
Aizobius | 10.8 | 10.8 | ||||||
Alocentron | 10.5 | 10.5 | ||||||
Apion | 6.3 | 8.6 | ||||||
Aspidapion | 4.4 | 4.4 | 4.9 | 4.9 | ||||
Catapion | 8.5 | 11.2 | ||||||
Ceratapion | 10.8 | 10.8 | 4.4 | 8.6 | ||||
Cistapion | 15.8 | 15.8 | ||||||
Cyanapion | 10.9 | 11.8 | ||||||
Diplapion | 8.1 | 8.1 | 3.0 | 3.0 | ||||
Eutrichapion | 11.0 | 11.0 | ||||||
Exapion | 10.5 | 10.5 | 2.3 | 7.0 | ||||
Hemitrichapion | 12.6 | 13.5 | ||||||
Holotrichapion | 5.0 | 8.7 | 7.5 | 10.3 | ||||
Ischnopterapion | 13.3 | 13.3 | ||||||
Ixapion | 12.6 | 12.6 | ||||||
Kalcapion | 4.3 | 4.7 | 4.1 | 4.1 | ||||
Lepidapion | 2.7 | 2.7 | ||||||
Loborhynchapion | 13.2 | 13.2 | ||||||
Malvapion | 11.7 | 11.7 | ||||||
Omphalapion | 13.2 | 13.2 | 13.2 | 13.2 | ||||
Onychapion | 13.2 | 13.2 | ||||||
Oryxolaemus | 8.5 | 8.5 | ||||||
Oxystoma | 11.1 | 11.5 | ||||||
Perapion | 2.1 | 2.1 | ||||||
Phrissotrichum | 8.2 | 8.2 | 8.2 | 8.2 | ||||
Protapion | 4.4 | 7.2 | ||||||
Protopirapion | 14.3 | 14.3 | ||||||
Pseudapion | 7.8 | 7.8 | 7.8 | 10.5 | ||||
Pseudaplemonus | 15.5 | 15.5 | ||||||
Pseudoperapion | 18.2 | 18.2 | ||||||
Pseudoprotapion | 14.6 | 14.6 | ||||||
Pseudostenapion | 17.6 | 17.6 | ||||||
Rhopalapion | 11.6 | 11.6 | ||||||
Stenopterapion | 12.6 | 12.6 | 12.6 | 13.2 | ||||
Synapion | 13.2 | 13.2 | ||||||
Taeniapion | 4.2 | 7.0 | 1.7 | 1.7 | ||||
Taphrotopium | 13.4 | 13.4 | ||||||
Trichopterapion | 16.4 | 16.4 |
Summarised DiStats results for genera of Ceutorhynchinae. Numbers indicate uncorrected p-distance values (genetic distances) expressed in percent. Two values are given per genus and distribution range: 1. minimum distance to the closest congener within all species in the dataset and 2. average distance to the closest congener within all species in the dataset. Abbreviations: min. = minimum, dist. = distance
Ceutorhynchinae | island(s)/ archipel | island(s)/ archipel |
endemic (50 km) |
endemic (50 km) |
medium (50-2000 km) |
medium (50-2000 km) |
large (> 2000 km) |
large (> 2000 km) |
Distribution group | ISL | ISL | C1 | C1 | C2 | C2 | C3 | C3 |
Genus | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener | min. dist. to closest congener |
Aphytobius | 7.6 | 7.6 | 7.6 | 7.6 | ||||
Auleutes | 15.7 | 15.7 | ||||||
Barioxyonyx | 14.1 | 14.9 | ||||||
Brachiodontus | 15.7 | 15.8 | ||||||
Ceutorhynchus | 5.5 | 9.8 | 6.8 | 9.8 | 4.1 | 9.1 | ||
Coeliodinus | 13.7 | 13.7 | ||||||
Datonychidius | 15.4 | 15.4 | ||||||
Drupenatus | 13.5 | 13.5 | ||||||
Eucoeliodes | 14.6 | 14.6 | ||||||
Eubrychius | 12.1 | 12.1 | ||||||
Glocianus | 13.2 | 13.3 | ||||||
Hadroplontus | 10.9 | 10.9 | ||||||
Hesperorrhynchus | 5.0 | 6.4 | ||||||
Homorosoma | 14.3 | 14.3 | ||||||
Marmaropus | 15.9 | 15.9 | ||||||
Mesoxyonyx | 14.6 | 14.6 | ||||||
Micrelus | 11.9 | 11.9 | ||||||
Microplontus | 12.3 | 14.0 | ||||||
Mogulones/ Datonychus |
7.8 | 8.7 | 11.6 | 13.1 | 6.1 | 11.1 | ||
Mogulonoides | 13.4 | 13.4 | ||||||
Neoglocianus | 10.7 | 10.7 | ||||||
Neophytobius | 14.9 | 14.9 | ||||||
Oprohinus | 15.1 | 15.1 | ||||||
Oreorrhynchaeus | 12.5 | 12.5 | ||||||
Parethelcus | 8.6 | 8.6 | ||||||
Paroxyonyx | 9.7 | 11.3 | 9.7 | 10.9 | ||||
Pelenomus | 11.5 | 13.4 | ||||||
Perioxyonyx | 16.0 | 16.0 | ||||||
Phrydiuchus | 12.2 | 12.2 | 10.5 | 11.5 | ||||
Poophagus | 14.4 | 14.4 | ||||||
Prisistus | 15.7 | 15.7 | 12.9 | 12.9 | ||||
Ranunculiphilus | 12.8 | 12.8 | ||||||
Rhinoncus | 8.4 | 8.4 | 7.0 | 10.1 | ||||
Scleropterus | 17.8 | 17.8 | ||||||
Scleropteridius | 15.9 | 15.9 | ||||||
Sirocalodes | 10.6 | 10.6 | 10.6 | 10.6 | ||||
Thamiocolus | 13.1 | 13.2 | 12.8 | 13.8 | ||||
Trichosirocalus | 12.0 | 12.0 | 12.6 | 13.6 | ||||
Zacladus | 12.2 | 12.2 |
For genera of the subfamily Cryptorhynchinae, the average distance value between the closest available congener (often sister species) ranges from 3.8% (Silvacalles) to 19.9% (Torneuma). For genera of Apioninae, the average distance between the closest related congener ranges from 1.7% (Taeniapion) to 18.2% (Pseudoperapion). For genera of Ceutorhynchinae, the average distance between the closest related congener ranges from 6.4% (Hesperorrhynchus) to 17.8% (Scleropterus).
For some genera, the smallest p-distance value is significantly lower than the average one. This is often caused by just a single specimen within the genus. For example, in Exapion, the average distance between species is 7.0%, while the lowest value between two species is 2.3%. The closest conge Accept ner pair in this case is Exapion compactum vs. Exapion uliciperda. All taxa and their closest congeners are listed in the spreadsheets in tab "DiStats_results" (Suppl. material
See Table
Left side of table: summarised ASAP results, right side of table: evaluation of concordance between MOTUs and morphospecies. For each subfamily dataset, 10 different thresholds ("ASAP partitions") and derived MOTUs are calculated by ASAP. The evaluation of concordance provides the deviations between MOTUs and morphospecies for each given threshold; wrongly assigned MOTUs are given in absolute numbers and in percent. Marked tables point to the threshold which fits best to each subfamily dataset (lowest number of deviation between MOTUs and morphospecies).
ASAP results |
ASAP results |
ASAP results |
ASAP results |
evaluation of concordance | evaluation of concordance | evaluation of concordance |
ASAP Partition | MOTUs | Threshold [%] | ASAP-score | no of wrongly assigned taxa | no of wrongly assigned seqs. | % of wrongly assigned seqs. |
Cryptorhynchinae sub-dataset (contains 265 morphospecies, 1106 sequences) | ||||||
1 | 236 | 7.3 | 8.5 | 74 | 214 | 19% |
2 | 241 | 7.2 | 9.0 | 73 | 214 | 19% |
3 | 315 | 3.8 | 9.5 | 84 | 190 | 17% |
4 | 348 | 2.4 | 15.0 | 104 | 206 | 19% |
5 | 639 | 0.4 | 16.5 | 373 | 480 | 43% |
6 | 251 | 6.8 | 17.0 | 68 | 181 | 16% |
7 | 325 | 3.4 | 24.5 | 86 | 193 | 17% |
8 | 316 | 3.7 | 25.5 | 83 | 186 | 17% |
9 | 302 | 4.3 | 29.0 | 84 | 205 | 19% |
10 | 329 | 3.1 | 29.5 | 89 | 189 | 17% |
Apioninae sub-dataset (contains 114 morphospecies, 342 sequences) | ||||||
1 | 95 | 6.0 | 3.0 | 19 | 47 | 14% |
2 | 93 | 7.3 | 4.5 | 21 | 52 | 15% |
3 | 92 | 7.5 | 5.0 | 22 | 52 | 15% |
4 | 87 | 8.2 | 5.5 | 28 | 61 | 18% |
5 | 94 | 6.8 | 7.5 | 17 | 41 | 12% |
6 | 111 | 3.8 | 9.5 | 13 | 25 | 7% |
7 | 88 | 8.1 | 11.0 | 24 | 57 | 17% |
8 | 129 | 1.9 | 14.5 | 24 | 29 | 8% |
9 | 116 | 3.0 | 14.5 | 16 | 28 | 8% |
10 | 112 | 3.3 | 15.0 | 13 | 27 | 8% |
Ceutorhynchinae sub-dataset (contains 199 morphospecies, 491 sequences) | ||||||
1 | 204 | 5.1 | 1.0 | 17 | 28 | 6% |
2 | 191 | 6.9 | 7.0 | 19 | 38 | 8% |
3 | 206 | 5.0 | 11.0 | 15 | 24 | 5% |
4 | 186 | 7.7 | 11.5 | 19 | 40 | 8% |
5 | 235 | 2.2 | 13.0 | 37 | 59 | 12% |
6 | 190 | 7.1 | 13.5 | 18 | 37 | 8% |
7 | 178 | 8.5 | 14.0 | 23 | 64 | 13% |
8 | 183 | 7.8 | 16.0 | 20 | 42 | 9% |
9 | 178 | 8.6 | 16.5 | 23 | 46 | 9% |
10 | 203 | 5.5 | 18.5 | 18 | 31 | 6% |
The present DNA barcode release provides results for almost 1300 Western Palearctic weevil taxa. This dataset's strength lies in its thorough validation of specimens, including the actual nomenclatorial resolution of many cases of previous taxonomic conflicts (in preceding publications within the MWI project). The correct identifications are mirrored in a high consistency between morphological identifications and molecular results. The ambiguous cases where molecular and morphological evidence could not be reconciled are discussed (Suppl. material
DiStats statistics. The most densely sampled subfamilies in the dataset often show genus-specific distances between species; see Suppl. material
By summarising the statistical findings, it can be concluded that applying a single general genetic threshold for species delineation leads to mismatches between morphospecies and MOTUs, either false positives (oversplits) or false negatives (lumps). These mismatches are also clearly demonstrated in the ASAP results (see Table
Targeting alpha-taxonomic questions with a single threshold approach likely leads to unsatisfactory error rates between 5% and 43% (see ASAP results in Table
ASAP or other single-threshold approaches are a convenient option to estimate species richness in widely-unknown biota or when there is no option to resort to using morphological information. Additionally, within a rough biodiversity assessment (e.g. metabarcoding), a small taxonomic error rate might not distort the final result. However, incorrect identifications can subsequently be incorporated into further studies. In the worst case, long-term environmental programmes could generate error cascades which can have a negative impact on environmental management and conservation (
It is known that undersampling leads to artificially increased interspecific genetic distances, creating deeper splits in trees and wider barcoding gaps (
Besides missing several species, sampling usually could not cover the entire geographic distribution of most continental taxa in our dataset. Complete sampling within each species' full geographic distribution range would likely reveal higher intraspecific distances than we can observe in the trees. Thus, we have not focused on intraspecific variation for the time being.
Future collecting of weevils on the Western Palearctic mainland should bear this in mind and should strive to fill the mentioned gaps. The Canary Islands were extensively sampled. The dataset usually contains at least one specimen per taxon from each island (for multi-island distributions) or several collecting spots per island (for endemic/single island distributions). Most species occurring in the Canaries do not occur on the mainland. Many mainland species, however, especially in Apioninae and Ceutorhynchinae, occur far beyond the Western Palearctic. Species distribution maps for the three subfamilies in focus are provided as external supplement under DOI 10.5281/zenodo.7430565.
The many different examples from the literature (
The thresholds we propose to use as heuristic support tools for future research in these groups are well-calibrated morphologically and based on "good species". These values can provide a reference for future alpha-taxonomic weevil research consistent with the definitions or understanding of existing species. Genetic distances are easy to measure, but prompt the question of how much distance is needed to delineate species. On the morphological side, each weevil group has its own set of particular characteristics used for morphological identification, which specialists have agreed upon over time, often somewhat subjectively. Morphological variations (intraspecific and interspecific) have been the basis of discussions in taxonomy ever since and are crucial to study prior to a taxonomic change. Those morphological characters used for identification and delineation are mostly based on the consensus principle of the scientific community. Essential characters in one group, bristle length, for example, might not play any role in another group, where perhaps the colouration pattern on the elytra or the protrusion of the eyes may constitute the central diagnostic characters. Based on many years of experience, a specialist will know about those morphological characters in his/her studied group. Known morphological variability within species is well factored in when examining differential characters for a new species description. Comparable situations arise when NJ trees or their underlying genetic distances are discussed (molecular intraspecific and interspecific variation), for example, when a single species appears in two neighbouring clusters. Based on a solely molecular point of view, those clusters might be separated by sufficient distance to infer the existence of a new species. Still, the morphologist may know from experience that those two clusters belong to a geographic variation.
A - mostly historical - quantitative approach for species delineation was morphological phenetics or "numerical taxonomy" (
New species descriptions, based solely on DNA barcoding, have been carried out or at least suggested for cryptic species or species complexes soon after DNA barcoding was established (
Yet, excluding morphology is not commonly accepted in the scientific community (
Relying solely or predominantly on DNA barcodes for species descriptions promises a turbo taxonomy (
The taxonomic inflation issue was addressed before DNA barcoding was introduced. Concerns were based on the practice of raising taxa from subspecies to species level, thus resulting in a change of the species concept rather than new species discoveries (
Adding nuclear markers in combination with phylogenetic models like multi-species coalescent model improve the accuracy of species delineation drastically (
During the past 250 years, almost every taxonomic change was based on morphological characters, continuously re-evaluating the underlying morphological characters. Hence for weevils, we can assume this "cleanup process" has built a strong foundation of valid morphological characters in most cases. We suggest preserving the already established and globally-accepted Linnean understanding of species as taxonomic backbone. This will ease the progression from a morphology-based past into a strong molecular-based future taxonomy, which will be compatible with the past. The risk of a disjunct parallel taxonomy would be decreased and the potential taxonomic inflation restrained to a minimum. The morphologically calibrated genus-specific distance values, based on "good species" (Tables
Here, we should address some pitfalls to prevent future inflationary species descriptions:
1. Ignoring the minimum interspecific distances of the sister species. The interspecific genetic distances for weevils are mostly group-specific. A group can be a genus (e.g. Torneuma or Silvacalles), but it can also be a subgenus (e.g. subgenus Euphorbioacalles of the genus Dendroacalles) or even a species complex (e.g. Acalles maraoensis complex). The interspecific distances of the sister species should be considered. If no sister species pair is available in the dataset, the closest congeners can be taken for an approximation. Otherwise, newly-collected specimens originating from a different population might be potentially classified as new species. Even small distances can create a split in a tree and might justify a new species description at first glance. If the interspecific distance of the potential new species falls below the previously known minimal one, the researcher should be cautious not to describe a synonym. A description can still be carried out if strong reasons justify the new species (
2. Single sequences per taxon or population. Using a single sequence per taxon drastically increases the risk of wrong conclusions when applied to alpha-taxonomic questions because intraspecific variation is not shown, but can be high for some taxa. In addition to the increased risk of misidentifications in singletons, not including intermediate specimens (of the same species) can create an artificial split in a tree which could be misinterpreted as a newly-discovered species, especially if the analysed individual was collected far from the previously-known sequence. If a species has a disjunct distribution, providing just one sequence from each population increases the likelihood of producing a synonym. This risk especially applies to islands. Artificial deep splits can be produced if the intraspecific distances within a population coincide with or even exceed the interspecific distances. On dataset compilation, the full sampling depth should be used. Using a single sequence from each population instead of all available sequences means leaving out all intermediate specimens belonging to the same species. The intraspecific distances then present themselves as an artificial deep split. The latter might be the case for some Laparocerus taxa described recently (
3. Gaps in existing sequence databases. A large genetic distance to the closest congener in a sequence database is not proof of having discovered a new species. Often, no reference sequences of the sequenced species have been previously deposited. Subsequently, a misinterpretation of the interspecific genetic distance to the closest database match, for example, 15% to the closest deposited one, can lead to describing a synonym, particularly if the sister species' type material is not consulted. Although potentially new species can be discovered very quickly with DNA barcoding (
Following the biological species concept (
Data Type: geographical distribution maps. Brief description: the ZIP file contains 613 distribution maps from Western Palearctic weevil taxa. The distribution maps showing Europe originate from the Curculio Institute's website (www.curci.de). Additional information on distribution range and known synonyms were based on the information from the Löbl catalogues (
Data Type: Material Table and CO1 sequences. Brief description: alternative download source for the material table and the CO1 sequences used in this study. Download via Zenodo DOI: 10.5281/zenodo.7430106 (3.9 MB).
This study was based on the genuine dedication of many Curculio Institute members. They took a significant role in laborious collecting, identifying and meticulously discussing the taxonomic results, especially Peter Sprick (Germany), Jiří Krátký (Czech Republic), Lutz Behne (Germany) and Christoph Bayer (Germany).
We are extremely grateful to the laboratory technicians from the Museum Koenig who carried out the main part of the lab work and undertook every possible effort to get results from almost all samples, namely Hannah Petersen, Christina Blume and Laura von der Mark.
We highly appreciate the cooperation with Katja Kramp and Eva Kleibusch from the Senckenberg German Entomological Institute (SDEI), who provided additional 91 sequences to this DNA barcode release.
We would also like to thank all the authorities who issued collection permits, especially from the Canary Islands and the Azores.
The publishing fees were partly covered by the Leibniz association's open access publishing fund.
Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig, Adenauerallee 160, 53113 Bonn, Germany.
Curculio Institute - Center for Studies on western Palearctic Curculionoidea, Curculio-Institut e.V. (CURCI), Hauweg 62, 41066 Mönchengladbach, Germany.
The material table (248 pages in PDF format) contains information about each specimen's collecting spot, GPS position, collector, identifier, voucher numbers (DNA and tissue) and GenBank accession numbers.
This study's complete DNA barcode dataset is provided as nucleotide alignment and unaligned sequences in *.fasta file format. The *.fasta files can be opened with any text editor. A read_me.txt file is included explaining the naming scheme. A list of GenBank accession numbers only is included.
The ZIP file contains the neighbour-joining tree in four different file formats (newick, nex, png, svg). The tree is based on the complete DNA barcode dataset published in this study.
Contradictions between morphological identifications and molecular results are discussed (PDF file).
The ZIP file contains the 50% majority rule consensus trees for three sub-datasets of the study, calculated with MrBayes: 1) Cryptorhynchinae + Cossoninae; 2) Apioninae + Nanophyinae + Attelabidae; 3) Ceutorhynchinae. Each tree is provided in three different file formats (NEX, PNG, SVG). The searchable SVG file can be opened with any internet browser.
The supplement contains an in-depth description of the used method (which taxon was used as a reference taxon to create the morphologically calibrated p-distance statistics for Cryptorhynchinae, Apioninae and Ceutorhynchinae). Besides the input data to the DiStats scripts and unformatted output data, the data compilation leading to the output tables is described and available in three Excel spreadsheets.
Detailed description of the ASAP program and data compilation leading to the final summarised results table.
Excursus about various taxa and their thresholds for species delineation.
Additional information to the unsolved taxonomic status of Aeoniacalles aeonii bodegensis (Stüben, 2000).