Corresponding author: Pavel Stoev (
Academic editor: Robert Mesibov
We demonstrate how a classical taxonomic description of a new species can be enhanced by applying new generation molecular methods, and novel computing and imaging technologies. A cave-dwelling centipede,
While 13,494 new animal species were discovered by taxonomists in 2012 (
Here, we present a more holistic approach to taxonomic descriptions. It is exemplified through a new cave centipede,
To respond to the increasing interest in exposing and publishing biodiversity data (see e.g.,
The subfamily
While some species of
The present study is based on eight specimens of
DNA extraction was conducted in the the Canadian Centre for DNA Barcoding, Guelph on complete animals or part of the leg of the specimens preserved in 96% ethanol. Standard protocols of the Canadian Centre for DNA Barcoding were used for both
For the ABGD method, we tested various model combinations to cross-check the obtained results: relative gap with (X) ranging from 0.05 to 1.5, minimal intraspecific distance (Pmin) of 0.001 and maximal intraspecific distance (Pmax) ranging from 0.02 to 0.11. Pmin and Pmax refer to the genetic distance area where the barcoding gap should be detected, whereas X defines the width of the gap. Distance calculation was based on the Kimura-2-parameter model and a transition/transversion ratio of 2.0. The method was performed in 100 steps. Statistical Parsimony networks for the delineation of species were reconstructed on the basis of 95% statistical confidence (i.e. connection probability) using the program TCS 1.21 (
One entire adult male specimen of
The micro-CT scanning of one adult female specimen was performed at Bruker microCT, Kontich, Belgium, using a SkyScan 1172 system with the following settings: 40kV, 0.43° rotation step, acquiring 839 projection images from 360° with a pixel size of 8µm. Prior to scanning, the sample was dehydrated in graded ethanol: 50%, 70%, 90%, 100%, for 2 hours in total, and then transferred to HMDS (hexamethyldisilasane) for 2 hours, and air dried. Reconstruction was done with the SkyScan software NRecon, using a modified Feldkamp algorithm, and adjusting for beam hardening and applying ring artefact correction resulting in 3865 cross sections in. bmp format, with image size 2000x2000 pixels. The video of 3D volume renderings was created with CTVox, using the flight recorder function, and saved as an AVI (Audio Video Interface) file. The obtained data were processed through a transfer function where the different voxels with different grey value were (or weren't) made opaque and where the color was assigned to a certain grey value. The image stack is stored in GigaDB (
T – Tergite, TT – Tergites, Legs: L – left, R – right; Plectrotaxy table: Cx – coxa, Tr – trochanter, Pf – prefemur, F – femur, T – tibia, a, m, p stand for spines in respectively, anterior, medial and posterior position.
All characters like in the holotype, except the following: length of leg 15: prefemur 2.5 mm; femur 3.5 mm; tibia 4 mm; tarsus 1 3.7 mm; tarsus 2 2.5 mm; pretarsus 0.3 mm; ocelli: 1+12-1+13; antennae composed of 68-70 articles; coxosternal teeth: 6+7.
Length: 19-22 mm; ocelli: 1+10–1+11; antennae composed of 65-68 articles; coxosternal teeth: 7+7. Tergites: TT8, 10 and 11 slightly emarginated; posterior margin of TT2, 4, 6, 7 straight. Legs: seriate setae missing on the tarsi 1 and 2 of leg 15, present in one short row only on posterior part of tarsus 2. Female gonopods: with 2+2 elongated sharply pointed spurs slightly bent outwards and a single blunt claw; 3-4 dorsal setae on article 1; 8 on article 2. Sometimes, a small, pointed spine occurs posteriorly in the middle of the first genital segment; so far, it has been detected only in two adult females [
The species can be readily distinguished from all other congeners by the following set of molecular and morphological characters: interspecific genetic distance in COI from the closest neighbour,
As the morphology of
1 | Six poorly defined, feebly pigmented ocelli [ |
|
– | 10-25 pigmented ocelli |
|
2 | Leg 15 with a large knob on prefemur (Figs |
|
– | Leg 15 without such knob (Fig. |
|
3 | Prefemoral knob apically incised forming two rounded and densely setose processes (Fig. |
|
– | Prefemoral knob simple (Fig. |
|
4 | Prefemoral knob with a cluster of setae (Fig. |
|
– | Prefemoral knob without such cluster of setae (Fig. |
|
5 | Antennae with 50-60 antennal articles |
|
– | Antennae with 70-83 antennal articles |
|
6 | Prefemoral knob poorly developed (Fig. |
|
– | Prefemoral knob large (Fig. |
|
7 | Coxosternal teeth: 10+10-11+11; 1 ventral spine on tibia of leg 15 [ |
|
– | Coxosternal teeth: 8+8-9+9; 2 ventral spines on the tibia of leg 15 |
|
8 | Antennae with 74 antennal articles, ocelli 1+14 [ |
|
– | Antennae with more than 81-83 articles, ocelli 1+18-1+19 [ |
|
The ABGD approach clustered the 37
The raw data was first filtered by removing inadequate reads with: 1) adapter contamination; 2) ≥10 Ns; 3) ≥50 base pairs of low quality (quality value <65). The resulting 2 Gb of clean data were processed into subsequent assemblies using SOAPdenovo_trans (
According to the division of the subgenera of
Group I, characterized by short male gonopods and presence of a large knob on male prefemur 15, currently including Group II, lacking any specific modifications on male legs while gonopods are also short, encompassing Group III, characterized by the long gonopods and dorsal furrow on male prefemur 15, with
Jeekel’s division of the genus (
The new generation imaging technologies, such as magnetic resonance imaging (MRI) and micro-computed tomography (micro-CT) are opening new horizons in biology (
The ‘cybertype’ notion is herewith tested for the first time with the newly described taxon (Fig.
Whereas a lack of reference genomes in non-model organisms has hampered genetic and phylogenomic studies, transcriptomes may present a time and cost-effective substitute to whole genome sequencing for these types of studies and an efficient way to produce massive amounts of gene sequence data. While transcriptomic studies of centipede species, e.g.
In addition to making data publicly available, it is crucial to provide rich metadata to enable data interoperability and reuse. As there is only one transcriptome available, it is not possible to include additional ‘factor’ information. However, by including sequence reads, experimental design, protocols and processed data we were able to produce the maximum amount of (4*) MINSEQE compliant metadata (
For volumetric data created by techniques such as micro-CT and micro-MRI, no community standards exist yet. The DICOM standard (Digital Imaging and Communications in Medicine,
MicroCT image stack available in 2 different formats: Several ZIP files, each contains 500 bmp images, the scanning log documentation file and Darwin Core type specimen data. A single gzipped TAR archive of all 3876 bmp images, the scanning log documentation file and Darwin Core type specimen data. Documentation of the scanning and reconstruction process through ISA-TAB metadata provided by GigaDB and the inclusion of the scanning log file with the ‘cybertype’. Specimen data in Darwin Core format and link to the location of the physical material and the transcritomic data through Darwin Core comma-separated value format (CSV) files: Eupolybothrus_cavernicolus_sequenced_vaucher_ paratype.csv Eupolybothrus_cavernicolus_micro-CT_vaucher_ paratype.csv Eupolybothrus_cavernicolus_all_types.csv ISA-TAB metadata that ensure retrievability and interoperability.
In combination with the Darwin Core files describing the specimen data, we thus fully annotate and document the ‘cybertype’ of
Also, as a pilot project, we annotate all currently valid
To create reliable links between the published sequence IDs and BOLD, an online dataset DS-EUPCAV was generated in the BOLD system, through which the respective Barcode Index Numbers (BINs) of the specimens barcoded for this study may be tracked (for the BIN concept see
This study demonstrates a holistic approach to the description of a new taxon, extending the conventional written description and two-dimensional illustrations with an array of different information types. While this novel approach contributes to the different aspects of the species' identity, its main aim is to provide an integrated approach to handling and publishing large data sets associated with a taxon. The generation of large molecular and morphological data pools that originate from type specimens increases the applicability of taxonomic data in other scientific disciplines such as comparative morphology, evolutionary biology, medicine, ecology, and others.
The concept of a “cybertype” is discussed in the study, but at the same time new questions arise, pertaining to the definition of such a “cybertype” and they will have to be addressed by the taxonomy community. Several different kinds of data belonging to the “cybertype” concept are treated in this study, from free text to sequence data, and from images to volumetric data. Questions have to be addressed such as whether a cybertype should only be restricted to morphological data, what data can be used to constitute a cybertype and whether a cybertype can be composite (i.e. consisting of several data types) or even distributed (different parts of the data residing on different physical servers). Further problems to be addressed are the lack of appropriate mechanisms for cross-linking molecular and morphological data, as well as the absence of global metadata standards and reliable data repositories for micro-CT image libraries. The metadata descriptors for micro-CT files used by the Morphosource and GigaDB repositories are a good starting point for that, as is the use of ISA-TAB to integrate everything together. Whatever the answers to these questions, there is one mandatory requirement for data that we can already identify: discoverability and accessibility.
With complex taxon descriptions such as the present one, we are entering new dimensions of data volumes that have to be managed properly to realise their true value. The deposition of large data pools in appropriate repositories is not yet straightforward, and such initiatives have started to emerge only recently. It is our task to ensure from the beginning that they do not develop into isolated data worlds but that they support community standards, describing the datasets in a way that they can be retrieved and cross-linked. Currently, even in modern taxon descriptions, different pieces of data are only linked through a central locus: the published article. In a future, data-centric world of taxonomy, articles published through next generation journal workflows will become an even more important node in a linked network of data elements describing the taxon. These data elements have to be defined and made accessible through persistent identifiers – not unlike the traditional practice for physical specimens that are accessible through their museum accession number. In combination with rich metadata standards, taxonomy will thus open itself up to the semantic Web with its possibilities for intelligent, complex queries.
In this study, we have taken a first step towards that direction. All data have been deposited in publicly accessible repositories, such as GigaDB, NCBI, BOLD, Morphbank and Morphosource, and the respective open licenses used ensure their accessibility and re-usability. GigaDB in this example provides direct links between the genomic and micro-CT data, through a Darwin Core CSV dataset describing the type specimens, as well as capturing all of the metadata in the interoperable ISA-TAB format. Molecular data and images are annotated with rich metadata to ensure discoverability and reuse. Techniques such as micro-CT are, however, still in their infancy, and no standardised metadata schemas exist yet – a gap that needs urgently to be addressed by the community if we are to avoid a proliferation of isolated datasets.
Taxonomy is at a turning point in its history. New technologies allow for creation of new types of information at high speed and in gigantic volumes, but without clear rules for communication standards, we will not be able to exploit their full potential. We need to focus our efforts on linking these bits and pieces together, by documenting them, by standardising them and by making them retrievable. If such an infrastructure is in place, unforeseen analytical powers can be unleashed upon these data, creating a revolution in our abilities to understand and model the biosphere.
This project was developed in collaboration between several research institutions and driven by Pensoft Publishers, BGI-Shenzhen and
Habitus of
cephalic plate, dorsal view
ocelli and Tömösváry’s organ. Abbreviations: ocellus (
clypeus, ventral view; most setae broken off
tip of antenna
forcipules, ventral view
close up of coxosternum, ventral view. Abbreviations: porodonts (
tergite 7, dorsal view
tergites 12-13, dorsal view
tergite 14 and intermediate tergite, posteriodorsal view. Abbreviations: seta-free areas (
pretarsus of leg 10, ventral view. Abbreviations: anterior accessory claw (
tarsus 1, tarsus 2 and pretarsus of leg 10, lateral view. Abbreviations: pectinal setae (
pretarsus of leg 15
prefemur 15, mesoventral view. Abbreviations: prefemoral knob (
close up of the prefemoral knob, ventral view
close up of the clusp of setae on male prefemur 15
close up of the setose protuberance on male prefemur 15
close up of the tip of prefemoral spine
coxal pore pit, meso-ventral view
Map of Croatia showing the locality of
Entrance of cave Miljacka II, type locality of
ocelli
forcipules, ventral view
tergite 14 and intermediate tergite, dorsal view
close up of posterior part of prefemur of leg 14 showing the expanded distal part bearing feebly defined setose protuberance
Prefemur of male leg 15. From
Prefemur of male leg 15. From
Delineation of
Gene annotation. Original data available from GigaScience GigaDB (
E-value, identity and species distribution statistics of the sequences that can find homologs on Nr database
COG functional classification of the transcripts
GO categories of the transcripts
Movie of
Plectrotaxy of
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
1 | amp | amp | amp | amp | a-p | a | ||||
2 | amp | amp | amp | amp | a-p | a-p | ||||
3 | amp | amp | amp | amp | a-p | a-p | ||||
4 | amp | amp | amp | amp | a-p | a-p | ||||
5 | amp | amp | amp | amp | a-p | a-p | ||||
6 | amp | amp | amp | amp | a-p | a-p | ||||
7 | amp | amp | amp | amp | a-p | a-p | ||||
8 | amp | amp | amp | amp | a-p | a-p | ||||
9 | amp | amp | amp | amp | a-p | a-p | ||||
10 | amp | amp | amp | amp | a-p | a-p | ||||
11 | amp | amp | amp | a | amp | a-p | a-p | |||
12 | m | amp | amp | amp | a | amp | a-p | a-p | ||
13 | m | amp | amp | amp | a | amp | a-p | a-p | ||
14 | m | amp | am | a | a | amp | a-p | a-p | ||
15 | am | m | amp | am | a | a | am | p | - |
Plectrotaxy of
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
1 | amp | amp | amp | amp | a-p | a | ||||
2 | amp | amp | amp | amp | a-p | a-p | ||||
3 | amp | amp | amp | amp | a-p | a-p | ||||
4 | amp | amp | amp | amp | a-p | a-p | ||||
5 | amp | amp | amp | amp | a-p | a-p | ||||
6 | amp | amp | amp | amp | a-p | a-p | ||||
7 | amp | amp | amp | amp | a-p | a-p | ||||
8 | amp | amp | amp | amp | a-p | a-p | ||||
9 | amp | amp | amp | amp | a-p | a-p | ||||
10 | amp | amp | amp | amp | a-p | a-p | ||||
11 | amp | amp | amp | (a) | amp | a-p | a-p | |||
12 | m | amp | amp | amp | (a) | amp | a-p | a-p | ||
13 | m | amp | amp | amp | a | amp | a-p | a-p | ||
14 | am | m | amp | amp | a | a | amp | a-p | p | |
15 | am | m | amp | am | a | a | amp | p | - |
Interspecific genetic distances (K2P) of
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
||||||||||||
|
|
16.7 - 17.8 | |||||||||||
|
|
16.2 - 17.0 | 18.5 - 19.4 | ||||||||||
|
|
17.6 - 18.0 | 14.5 - 15.4 | 20.8 - 21.2 | |||||||||
|
|
14.7 - 15.1 | 17.1 - 17.5 | 17.1 - 17.3 | 18.0 - 18.1 | ||||||||
|
|
16.3 - 16.8 | 18.7 - 19.2 | 17.5 - 17.7 | 21.9 - 22.1 | 13.7 | |||||||
|
17.7 - 18.0 | 16.7 - 17.3 | 18.3 - 18.5 | 17.4 - 17.7 | 18 | 18.3 | |||||||
|
17.4 - 17.8 | 18.6 - 19.1 | 19.4 - 19.7 | 18.1 - 18.4 | 15.7 | 17.5 | 10.7 | ||||||
|
|
20.4 - 21.3 | 20.7 - 21.6 | 21.4 - 22.1 | 20.6 - 20.7 | 16.0 - 16.4 | 20.4 - 20.8 | 18.1 | 19.7 - 20.1 | ||||
|
|
21.9 - 22.5 | 18.9 - 20.1 | 21.6 - 21.8 | 20.0 - 20.2 | 21 | 21.7 | 22.3 | 21.5 | 23.2 - 23.6 | |||
|
|
20.1 - 23.2 | 19.4 - 21.8 | 21.1 - 24.1 | 21.2 - 22.7 | 20.1 - 21.7 | 21.7 - 22.6 | 20.7 - 22.4 | 19.4 - 21.0 | 21.4 - 22.3 | 17.2 - 18.8 | ||
|
|
19.2 - 19.6 | 21.0 - 21.9 | 20.9 - 21.1 | 24.2 - 24.5 | 16.6 | 15.3 | 20.9 | 18.9 | 20.3 | 22.1 | 20.7 - 22.1 |
Raw data used for COI delineation of the
genomic
The archive contains the following data: 1) fasta-Alignment as the basis for all analyses (.FASTA), 2) mega-file for the calculation of the genetic distances and the NJ tree (.MDSX), 3) NJ-tree in Newick format (.NWK), 4) graph of the TCS Software for the Statistical Parsimony method (.GRAPH)
File: oo_4933.rar