Biodiversity Data Journal :
Taxonomic paper
|
Eupolybothrus cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae): the first eukaryotic species description combining transcriptomic, DNA barcoding and micro-CT imaging data
Corresponding author:
Academic editor: Robert Mesibov
Received: 19 Oct 2013 | Accepted: 23 Oct 2013 | Published: 28 Oct 2013
© 2013 Pavel Stoev, Ana Komerički, Nesrine Akkari, Shanlin Liu, Xin Zhou, Alexander Weigand, Jeroen Hostens, Christopher Hunter, Scott Edmunds, David Porco, Marzio Zapparoli, Teodor Georgiev, Daniel Mietchen, Daniel Mietchen, David Roberts, Sarah Faulwetter, Lyubomir Penev, Vincent Smith, Lyubomir Penev
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Stoev P, Komerički A, Akkari N, Liu S, Zhou X, Weigand A, Hostens J, Hunter C, Edmunds S, Porco D, Zapparoli M, Georgiev T, Mietchen D, Roberts D, Faulwetter S, Smith V, Penev L (2013) Eupolybothrus cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae): the first eukaryotic species description combining transcriptomic, DNA barcoding and micro-CT imaging data. Biodiversity Data Journal 1: e1013. https://doi.org/10.3897/BDJ.1.e1013
|
We demonstrate how a classical taxonomic description of a new species can be enhanced by applying new generation molecular methods, and novel computing and imaging technologies. A cave-dwelling centipede, Eupolybothrus cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae), found in a remote karst region in Knin, Croatia, is the first eukaryotic species for which, in addition to the traditional morphological description, we provide a fully sequenced transcriptome, a DNA barcode, detailed anatomical X-ray microtomography (micro-CT) scans, and a movie of the living specimen to document important traits of its ex-situ behaviour. By employing micro-CT scanning in a new species for the first time, we create a high-resolution morphological and anatomical dataset that allows virtual reconstructions of the specimen and subsequent interactive manipulation to test the recently introduced ‘cybertype’ notion. In addition, the transcriptome was recorded with a total of 67,785 scaffolds, having an average length of 812 bp and N50 of 1,448 bp (see GigaDB). Subsequent annotation of 22,866 scaffolds was conducted by tracing homologs against current available databases, including Nr, SwissProt and COG. This pilot project illustrates a workflow of producing, storing, publishing and disseminating large data sets associated with a description of a new taxon. All data have been deposited in publicly accessible repositories, such as GigaScience GigaDB, NCBI, BOLD, Morphbank and Morphosource, and the respective open licenses used ensure their accessibility and re-usability.
Cybertaxonomy, gene sequence data, micro-CT, data integration, molecular systematics, caves, Croatia, biospeleology
While 13,494 new animal species were discovered by taxonomists in 2012 (Index of Organism Names), animal diversity on the planet continues to decline with unprecedented speed (
Here, we present a more holistic approach to taxonomic descriptions. It is exemplified through a new cave centipede, Eupolybothrus cavernicolus Komerički & Stoev sp. n., recently discovered by biospeleologists in Croatia. To the best of our knowledge, this is the first time the description of a new eukaryotic species has been enhanced with rich genomic and morphological data, including a fully sequenced transcriptome, DNA barcodes, detailed X-ray micro-computed tomography scanning (micro-CT), and a video of a living specimen showing behavioural features. In this increasingly data-driven era, a further aim of this study is to set a new standard for handling, management and publishing of various data types. It is essential that data are easily accessible to researchers in every field of science, and able to be integrated from many sources, to tackle complex and novel scientific hypotheses. Rapid advances and increasing throughput of technologies such as phenotyping, genome-scale sequencing and meta-barcoding are now producing huge volumes of data, but there has been a lag in efforts to curate, present, harmonise and integrate these data to make them more accessible and re-usable for the community. Furthermore, by employing micro-CT scanning we test for the first time in a new taxon the recently introduced ‘cybertype’ notion (
To respond to the increasing interest in exposing and publishing biodiversity data (see e.g.,
The subfamily Ethopolyinae Chamberlin, 1915 is known to comprise some of the largest lithobiomorphs in the world, with several species reaching 45-50 mm in length. At present, the subfamily includes four more or less well defined genera: Bothropolys Wood, 1862 with around 40 species from North America and East Asia; Archethopolys Chamberlin, 1925 with three species from the southwestern USA, Zygethopolys Chamberlin, 1925 with four species from western Canada and the USA, and Eupolybothrus Verhoeff, 1907 with 23 valid and 15 doubtful species and subspecies assigned to seven subgenera ranging from Southern Europe and North Africa to the Near and Middle East, including the largest Mediterranean islands Corsica, Sardinia, Sicily, Crete and Cyprus (
While some species of Eupolybothrus and the genus itself have been treated recently in several publications (see e.g.,
The present study is based on eight specimens of Eupolybothrus cavernicolus Komerički & Stoev sp. n. belonging to the Croatian Biospeleological Society (CBSS), the National Museum of Natural History, Sofia (NMNHS) and the Natural History Museum of Denmark (ZMUC). The specimens were preserved in ethanol (70 or 96%) or RNAlater (Qiagen, USA). The morphological study of the new species was performed at NMNHS and CBSSS with a Zeiss microscope. For scanning electron microscopy (performed at ZMUC), parts of the specimens were cleaned by ultrasonification, transferred to 96% ethanol and then to acetone, air-dried, mounted on adhesive electrical tape attached to aluminium stubs, coated with platinum/palladium and studied in a JEOL JSM-6335F scanning electron microscope. Images were edited in Adobe Lightroom 4.3 and Adobe Photoshop CS 5. All morphological images have been deposited in Morphbank. Terminology for external anatomy follows
DNA barcode sequencing
DNA extraction was conducted in the the Canadian Centre for DNA Barcoding, Guelph on complete animals or part of the leg of the specimens preserved in 96% ethanol. Standard protocols of the Canadian Centre for DNA Barcoding were used for both DNA extraction and amplification. All specimen data are stored in the Barcode of Life Data System (BOLD) online database and are available also in the dataset DS-EUPCAV (https://doi.org/10.5883/DS-EUPCAV), where they are linked to the respective Barcode Index Numbers clusters. This dataset contains sequences from ten species: E. cavernicolus Komerički & Stoev sp. n., E. leostygis (Verhoeff, 1899), E. obrovensis (Verhoeff, 1930), E. grossipes (CL Koch, 1847), E. gloriastygis (Absolon, 1916), E. nudicornis, E. litoralis, E. kahfi, E. transsylvanicus (Latzel, 1882) and E. tridentinus. In addition, all sequences were registered in GenBank (accession numbers KF715038-KF715064, HM065042-HM065044, HQ941581-HQ941585, JN269950, JN269951, JQ350447, JQ350449), one sequence of E. fasciatus (Newport, 1845) was recovered from GenBank (accession number AY214420). Two sequences from two Lithobius species were included as outgroups: L. austriacus (Verhoeff, 1937) (MYFAB442-11) and L. crassipes L. Koch, 1962 (MYFAB443-11). The final dataset comprises 39 sequences. Molecular delimitation of species was achieved by the implementation of the Automatic Barcoding Gap Discovery (ABGD) procedure as described in
For the ABGD method, we tested various model combinations to cross-check the obtained results: relative gap with (X) ranging from 0.05 to 1.5, minimal intraspecific distance (Pmin) of 0.001 and maximal intraspecific distance (Pmax) ranging from 0.02 to 0.11. Pmin and Pmax refer to the genetic distance area where the barcoding gap should be detected, whereas X defines the width of the gap. Distance calculation was based on the Kimura-2-parameter model and a transition/transversion ratio of 2.0. The method was performed in 100 steps. Statistical Parsimony networks for the delineation of species were reconstructed on the basis of 95% statistical confidence (i.e. connection probability) using the program TCS 1.21 (
Transcriptome sequencing
One entire adult male specimen of Eupolybothrus cavernicolus Komerički & Stoev sp. n. was crushed and preserved in liquid RNAlater (Qiagen, USA) immediately after being captured. To extract total RNA, TRIzol reagent (Invitrogen, USA) was used according to the manufacturer’s instructions. Messenger RNA (mRNA) was isolated from total RNA using a Dynabeads mRNA Purification Kit (Invitrogen, USA). The mRNA was fragmented and transcribed into first-strand cDNA using SuperScript™II Reverse Transcriptase (Invitrogen, USA) and N6 primer (IDT). RNase H (Invitrogen, USA) and DNA polymerase I (Invitrogen, Shanghai China; New England BioLabs) were subsequently applied to synthesize the second-strand of the cDNA. The double-stranded cDNA then underwent end-repair, a single ‘A’ base addition, adapter ligation, and size selection, indexed and PCR amplified to construct a library. The extracted cDNA was utilised for library construction with an insert size of 250 bp. Finally, the library was sequenced on the Illumina HiSeq2000 sequencing platform (Illumina, Inc., San Diego, California, USA) at BGI-Shenzhen using a 150bp pair-end strategy to generate a total of 2.5 Gb raw reads. Illumina HCS1.5.15.1 + RTA1.13.48.0 were applied to generate a “bcl” file which was then downloaded to local computers. Secondly, the “bcl” file was converted to qseq format using BclConverter-1.9.0-11-03-08. Finally, we separated individual sample data from multiplexed machine runs based on the specific barcode primer sequences, and converted the file format to fastq.
The micro-CT scanning of one adult female specimen was performed at Bruker microCT, Kontich, Belgium, using a SkyScan 1172 system with the following settings: 40kV, 0.43° rotation step, acquiring 839 projection images from 360° with a pixel size of 8µm. Prior to scanning, the sample was dehydrated in graded ethanol: 50%, 70%, 90%, 100%, for 2 hours in total, and then transferred to HMDS (hexamethyldisilasane) for 2 hours, and air dried. Reconstruction was done with the SkyScan software NRecon, using a modified Feldkamp algorithm, and adjusting for beam hardening and applying ring artefact correction resulting in 3865 cross sections in. bmp format, with image size 2000x2000 pixels. The video of 3D volume renderings was created with CTVox, using the flight recorder function, and saved as an AVI (Audio Video Interface) file. The obtained data were processed through a transfer function where the different voxels with different grey value were (or weren't) made opaque and where the color was assigned to a certain grey value. The image stack is stored in GigaDB (
T – Tergite, TT – Tergites, Legs: L – left, R – right; Plectrotaxy table: Cx – coxa, Tr – trochanter, Pf – prefemur, F – femur, T – tibia, a, m, p stand for spines in respectively, anterior, medial and posterior position.
Body length: approx. 30 mm (measured from anterior margin of cephalic plate to posterior margin of telson); leg 15 – 22.6 mm long, or 75% length of body.
Color: uniformly yellow-brownish to chestnut, margins of cephalic plate slightly darker than inner parts (Fig.
Head: cephalic plate broader than long (4.0 x 3.6 mm, respectively), as wide as T1 (Fig.
Tergites: T1 wider than long, subtrapeziform, wider anteriorly, posterior margin straight or slightly emarginated, marginal ridge with a small median thickening; TT3 and 5 more elongated than T1, posterior margin slightly emarginated medially, posterior angles rounded; posterior angles of T4 rounded; posterior margin of T8 slightly emarginated medially, angles rounded; TT6 and 7 with posterior angles abruptly rounded (Fig.
Legs: leg 15 longest; leg 14 approx. 25% longer than legs 1-12, leg 13 only slightly longer than legs 1-12; pretarsus of legs 1–14 with expanded fundus, larger posterior accessory claw (approx. 1/3rd of fundus) and a slightly thinner and shorter anterior accessory claw (= spine, sensu
Sternites: all sternites smooth, subtrapeziform, with few sparse setae, mainly at lateral margins; posterior margins straight.
Genitalia: posterior margin of male first genital sternite deeply concave, up to half of its length, posterior margin densely covered with long setae, the rest of sternite sparsely covered with shorter setae; gonopod small, hidden behind the edge of first genital sternite, with 4-5 short setae (Fig.
Plectrotaxy: as in Table
Ventral | Dorsal | |||||||||
Cx | Tr | Pf | F | T | Cx | Tr | Pf | F | T | |
1 | amp | amp | amp | amp | a-p | a | ||||
2 | amp | amp | amp | amp | a-p | a-p | ||||
3 | amp | amp | amp | amp | a-p | a-p | ||||
4 | amp | amp | amp | amp | a-p | a-p | ||||
5 | amp | amp | amp | amp | a-p | a-p | ||||
6 | amp | amp | amp | amp | a-p | a-p | ||||
7 | amp | amp | amp | amp | a-p | a-p | ||||
8 | amp | amp | amp | amp | a-p | a-p | ||||
9 | amp | amp | amp | amp | a-p | a-p | ||||
10 | amp | amp | amp | amp | a-p | a-p | ||||
11 | amp | amp | amp | a | amp | a-p | a-p | |||
12 | m | amp | amp | amp | a | amp | a-p | a-p | ||
13 | m | amp | amp | amp | a | amp | a-p | a-p | ||
14 | m | amp | am | a | a | amp | a-p | a-p | ||
15 | am | m | amp | am | a | a | am | p | - |
All characters like in the holotype, except the following: length of leg 15: prefemur 2.5 mm; femur 3.5 mm; tibia 4 mm; tarsus 1 3.7 mm; tarsus 2 2.5 mm; pretarsus 0.3 mm; ocelli: 1+12-1+13; antennae composed of 68-70 articles; coxosternal teeth: 6+7.
Body length: approx. 31 mm; leg 15 approx. 20-21 mm, or 68% length of body. Color: uniformly yellow-brownish to chestnut, head and T1 darker, legs yellowish, margins of tergites darker; distal parts of tarsungulum, coxosternal teeth and pretarsi of all legs dark brown to blackish.
Head: cephalic plate broader than long (3.9 x 3.5 mm, respectively), as wide as anterior part of T1; surface smooth, with several pits scattered throughout the head and giving rise to trichoid setae. Cephalic median sulcus contributing to biconvex anterior margin, marginal ridge with a median thickening; posterior margin slightly concave; transverse suture situated at about 1/3 of anterior edge; posterior limbs of transverse suture visible, connecting basal antennal article with anterior part of ocellar area. Ocelli: 18 blackish, subequal in size, in 3-4 rows. Tömösváry’s organ: moderately large (as large as or slightly larger than a medium ocellus), oval, situated slightly above the cephalic edge below the inferiormost row of ocelli. Clypeus: with a cluster of about 25 trichoid setae situated on the apex. Antennae: approx. 22 mm long, composed of 67 articles, reaching the middle of T10 when folded backwards, basal 2 articles enlarged, less setose; posterior 30 articles visibly longer than broad, ultimate article approx. 1.3 times longer than penultimate one. Forcipules: coxosternite subpentagonal, shoulders almost absent, lateral margins straight; anterior margin set off as a rim by furrow; coxosternal teeth 7+7, median diastema well-developed, V-shaped, subparallel and narrow, porodont arising from a pit below the dental rim, situated lateral to the lateralmost tooth; base of porodont thinner then adjacent tooth, coxosternite sparsely setose anteriorly; setae moderately large, irregularly dispersed. Medial side of forcipular trochanteroprefemur, femur and tibia and proximal part of forcipular tarsungulum setose. Distal part of forcipular tarsungulum about 3 times longer than proximal part.
Tergites: T1 wider than long, subtrapeziform wider anteriorly, posterior margin slightly concave; TT3 and 5 more elongated than T1, posterior margin slightly concave medially, posterior angles rounded; T2 almost entirely covered by T1, only posteriormost part surpassing the margin of T1; posterior margin of TT4 and 6 straight, posterior angles abruptly rounded; T7 rectangular, posterior margin straight, posterior angles abruptly rounded; T8 approx. 1.4 times longer than T7, posterior margin of T8 slightly concave medially, angles abruptly rounded; TT9, 11, 13 with a well-developed posterior triangular projections; TT10 and 12 subequal in size, approx. 1.2 times longer than T8, posterior margin slightly emarginated; posterior margin of T14 slightly emarginated, surface smooth, posterior-most part covered with just a few trichoid setae (much more setose in male, see Fig.
Legs: leg 15 longest, leg 14 latter approx. 25% longer than legs 1-12, leg 13 only slightly longer than legs 1-12; pretarsus of legs 1–14 with a more expanded fundus, larger posterior accessory claw (approx. 1/3rd of fundus) and a slightly thinner and shorter anterior accessory claw (= spine, sensu
Sternites: subtrapeziform in shape, anterior part wider; lateral sides straight in all but ultimate sternite, where they are slightly convex; sternite surface smooth, shining, covered with a few sparse setae, mainly at lateral margins.
Female gonopods: densely setose, with 2+2 long and pointed spurs slightly bent outwards, and a single blunt claw; outer spur 1.4-1.5 times longer than the inner one, approx. 4 times longer than broad at base; 3-4 dorsal setae on article 1; 12 on article 2.
Plectrotaxy: as in Table
Plectrotaxy of E. cavernicolus Komerički & Stoev sp. n., female paratype.
Ventral | Dorsal | |||||||||
Cx | Tr | Pf | F | T | Cx | Tr | Pf | F | T | |
1 | amp | amp | amp | amp | a-p | a | ||||
2 | amp | amp | amp | amp | a-p | a-p | ||||
3 | amp | amp | amp | amp | a-p | a-p | ||||
4 | amp | amp | amp | amp | a-p | a-p | ||||
5 | amp | amp | amp | amp | a-p | a-p | ||||
6 | amp | amp | amp | amp | a-p | a-p | ||||
7 | amp | amp | amp | amp | a-p | a-p | ||||
8 | amp | amp | amp | amp | a-p | a-p | ||||
9 | amp | amp | amp | amp | a-p | a-p | ||||
10 | amp | amp | amp | amp | a-p | a-p | ||||
11 | amp | amp | amp | (a) | amp | a-p | a-p | |||
12 | m | amp | amp | amp | (a) | amp | a-p | a-p | ||
13 | m | amp | amp | amp | a | amp | a-p | a-p | ||
14 | am | m | amp | amp | a | a | amp | a-p | p | |
15 | am | m | amp | am | a | a | amp | p | - |
Length: 19-22 mm; ocelli: 1+10–1+11; antennae composed of 65-68 articles; coxosternal teeth: 7+7. Tergites: TT8, 10 and 11 slightly emarginated; posterior margin of TT2, 4, 6, 7 straight. Legs: seriate setae missing on the tarsi 1 and 2 of leg 15, present in one short row only on posterior part of tarsus 2. Female gonopods: with 2+2 elongated sharply pointed spurs slightly bent outwards and a single blunt claw; 3-4 dorsal setae on article 1; 8 on article 2. Sometimes, a small, pointed spine occurs posteriorly in the middle of the first genital segment; so far, it has been detected only in two adult females [
The species can be readily distinguished from all other congeners by the following set of molecular and morphological characters: interspecific genetic distance in COI from the closest neighbour, E. leostygis: 14.5-15.4%; antennae moderately long (approx. 70% body length), comprised of 67-71 articles; 11-15 ocelli; 6+6-8+8 coxosternal teeth; tergites 9, 11, 13 with posterior triangular projections; intermediate tergite subpentagonal, posterior margin deeply emarginated, middle part of posterior third of tergite densely covered with setae; laterally, on both sides of the central setose area, there are two specific seta-free regions; pretarsus 15 without accessory spines; leg 15 long (approx. 70-75% body length), prefemur of male leg 15 with a large, apically rounded proximal knob protruding mediad, latter slightly bent dorsad and bearing a cluster of long setae on tip; distal part of prefemur with a well-defined circular protuberance covered with setae; posterior margin of male first genital sternite deeply emarginated, nearly as deep as half of the sternite’s length.
Cavernicolus means “living in caves or caverns”, to emphasise that the species inhabits caves.
Eupolybothrus cavernicolus Komerički & Stoev sp. n. is so far known only from the caves Miljacka II and Miljacka IV (= Špilja kod mlina na Miljacki), situated near the village of Kistanje, Krka National Park, Knin District, Croatia (Fig.
Associated fauna: Gastropoda: Oxychilus cellarius (O.F. Müller, 1774), Hauffenia jadertina Kuščer, 1933, Hadziella sketi Bole, 1961; Araneae: Episinus cavernicola (Kulczynski, 1897), Nesticus eremita Simon, 1879, Tegenaria domestica (Clerck, 1757), Metellina merianae (Scopoli, 1763), Histopona sp.; Pseudoscorpiones: Chthonius tetrachelatus (Preyssler, 1790), Chthonius litoralis Hadži, 1933, Neobisium carsicum Hadži, 1933, Pselaphochernes litoralis Beier, 1956; Opiliones: Nelima troglodytes Roewer, 1910; Acari: Parasitus sp.; Isopoda: Monolistra pretneri Sket, 1964, Sphaeromides virei mediodalmatina Sket, 1964, Alpioniscus balthasari (Frankenberger, 1937), Cyphopleon kratochvili (Frankenberger, 1939); Amphipoda: Niphargus sp.; Decapoda: Troglocaris sp.; Chilopoda: Eupolybothrus tridentinus, Harpolithobius sp., Lithobius sp., Cryptops sp.; Diplopoda: Brachydesmus subterraneus Heller, 1858; Collembola: Troglopedetes pallidus Absolon, 1907, Heteromurus nitidus (Templeton, 1835), Pseudosinella heteromurina (Stach, 1929), Lepidocyrtus sp.; Diplura: Plusiocampa (Stygiocampa) dalmatica Conde, 1959, Japygidae gen. spp.; Coleoptera: Laemostenus cavicola mülleri (Schaum, 1860), Atheta spelaea (Erichson, 1839); Orthoptera: Dolichopoda araneiformis (Burmeister, 1838), Troglophilus ovuliformis Karny, 1907, Gryllomorpha dalmatina Ocskay, 1832; Psocoptera: Psyllipsocus sp.; Lepidoptera: Apopestes spectrum (Esper, 1787); Amphibia: Proteus anguinus Laurenti, 1768; Chiroptera: a colony of bats, Myotis capaccinii (Bonaparte, 1837) (
Lithobius (Polybothrus) leostygis Verhoeff, 1899 -
As the morphology of E. leostygis is still insufficiently known, we provide here scanning electron microscope images (Figs
Identification key to the species of Eupolybothrus (Schizopolybothrus) based on adult males |
||
1 | Six poorly defined, feebly pigmented ocelli [original description] | E. leostygis |
– | 10-25 pigmented ocelli | 2 |
2 | Leg 15 with a large knob on prefemur (Figs |
3 |
– | Leg 15 without such knob (Fig. |
E. tabularum |
3 | Prefemoral knob apically incised forming two rounded and densely setose processes (Fig. |
E. excellens |
– | Prefemoral knob simple (Fig. |
4 |
4 | Prefemoral knob with a cluster of setae (Fig. |
E. cavernicolus sp. n. |
– | Prefemoral knob without such cluster of setae (Fig. |
5 |
5 | Antennae with 50-60 antennal articles | 6 |
– | Antennae with 70-83 antennal articles | 7 |
6 | Prefemoral knob poorly developed (Fig. |
E. caesar |
– | Prefemoral knob large (Fig. |
E. spiniger |
7 | Coxosternal teeth: 10+10-11+11; 1 ventral spine on tibia of leg 15 [original description] | E. stygis |
– | Coxosternal teeth: 8+8-9+9; 2 ventral spines on the tibia of leg 15 | 8 |
8 | Antennae with 74 antennal articles, ocelli 1+14 [original description] | E. acherontis |
– | Antennae with more than 81-83 articles, ocelli 1+18-1+19 [original description] | E. a. wardaranus |
The ABGD approach clustered the 37 Eupolybothrus specimens into 12 groups (Fig.
Interspecific genetic distances (K2P) of Eupolybothrus species. Given are the ranges from minimum to maximum values.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
1 | E. gloriastygis BOLD:AAY5019 | ||||||||||||
2 | E. leostygis BOLD:AAY5071 | 16.7 - 17.8 | |||||||||||
3 | E. obrovensis BOLD:AAY5641 | 16.2 - 17.0 | 18.5 - 19.4 | ||||||||||
4 | E. cavernicolus BOLD:AAY4900 | 17.6 - 18.0 | 14.5 - 15.4 | 20.8 - 21.2 | |||||||||
5 | E. litoralis | 14.7 - 15.1 | 17.1 - 17.5 | 17.1 - 17.3 | 18.0 - 18.1 | ||||||||
6 | E. fasciatus | 16.3 - 16.8 | 18.7 - 19.2 | 17.5 - 17.7 | 21.9 - 22.1 | 13.7 | |||||||
7 | E. tridentinus GER1 BOLD:AAV7132 | 17.7 - 18.0 | 16.7 - 17.3 | 18.3 - 18.5 | 17.4 - 17.7 | 18 | 18.3 | ||||||
8 | E. tridentinus GER2 BOLD:AAV7131 | 17.4 - 17.8 | 18.6 - 19.1 | 19.4 - 19.7 | 18.1 - 18.4 | 15.7 | 17.5 | 10.7 | |||||
9 | E. transsylvanicus BOLD:AAJ0488 | 20.4 - 21.3 | 20.7 - 21.6 | 21.4 - 22.1 | 20.6 - 20.7 | 16.0 - 16.4 | 20.4 - 20.8 | 18.1 | 19.7 - 20.1 | ||||
10 | E. kahfi BOLD:AAY2955 | 21.9 - 22.5 | 18.9 - 20.1 | 21.6 - 21.8 | 20.0 - 20.2 | 21 | 21.7 | 22.3 | 21.5 | 23.2 - 23.6 | |||
11 | E. nudicornis BOLD:AAN2808 BOLD:AAN2810 BOLD:AAN2811 | 20.1 - 23.2 | 19.4 - 21.8 | 21.1 - 24.1 | 21.2 - 22.7 | 20.1 - 21.7 | 21.7 - 22.6 | 20.7 - 22.4 | 19.4 - 21.0 | 21.4 - 22.3 | 17.2 - 18.8 | ||
12 | E. grossipes BOLD:AAY7960 | 19.2 - 19.6 | 21.0 - 21.9 | 20.9 - 21.1 | 24.2 - 24.5 | 16.6 | 15.3 | 20.9 | 18.9 | 20.3 | 22.1 | 20.7 - 22.1 |
Delineation of Eupolybothrus species – Neighbor joining tree K2P distances. Visualised are the clusters obtained from the reversed Statistical Parsimony (SP) method and the Automatic Barcoding Gap Discovery (ABGD) procedure. Bootstrap support for the identified lineages are given above. The intraspecific genetic variability is given for each cluster. Source data is available in Suppl. material
The raw data was first filtered by removing inadequate reads with: 1) adapter contamination; 2) ≥10 Ns; 3) ≥50 base pairs of low quality (quality value <65). The resulting 2 Gb of clean data were processed into subsequent assemblies using SOAPdenovo_trans (
According to the division of the subgenera of Eupolybothrus of
Jeekel’s division of the genus (
The new generation imaging technologies, such as magnetic resonance imaging (MRI) and micro-computed tomography (micro-CT) are opening new horizons in biology (
The ‘cybertype’ notion is herewith tested for the first time with the newly described taxon (Fig.
Whereas a lack of reference genomes in non-model organisms has hampered genetic and phylogenomic studies, transcriptomes may present a time and cost-effective substitute to whole genome sequencing for these types of studies and an efficient way to produce massive amounts of gene sequence data. While transcriptomic studies of centipede species, e.g. Alipes grandidieri Lucas, 1864 (Chilopoda; Scolopendromorpha), exist in the literature (
In addition to making data publicly available, it is crucial to provide rich metadata to enable data interoperability and reuse. As there is only one transcriptome available, it is not possible to include additional ‘factor’ information. However, by including sequence reads, experimental design, protocols and processed data we were able to produce the maximum amount of (4*) MINSEQE compliant metadata (
For volumetric data created by techniques such as micro-CT and micro-MRI, no community standards exist yet. The DICOM standard (Digital Imaging and Communications in Medicine, http://dicom.nema.org/) used by the medical community is not tailored for taxonomic purposes, thus its usefulness for this research field still has to be investigated (
In combination with the Darwin Core files describing the specimen data, we thus fully annotate and document the ‘cybertype’ of Eupolybothrus cavernicolus Komerički & Stoev sp. n. The generation of large molecular and morphological data pools that originate from type specimens increases the applicability of taxonomic data in other scientific disciplines such as comparative morphology, evolutionary biology, medicine, ecology. The new holistic approach raises important questions and shows up new directions for developments of biodiversity data management about the lack of mechanisms for cross-linking molecular and morphological data and global metadata standards for micro-CT and transcriptomic data, as well as absence of reliable data repositories for micro-CT image libraries.
Also, as a pilot project, we annotate all currently valid Eupolybothrus (Schizopolybothrus) species with their original descriptions that were extracted from the original publications through applying optical character recognition (OCR) and additionally tagged by using Golden Gate software (
To create reliable links between the published sequence IDs and BOLD, an online dataset DS-EUPCAV was generated in the BOLD system, through which the respective Barcode Index Numbers (BINs) of the specimens barcoded for this study may be tracked (for the BIN concept see
This study demonstrates a holistic approach to the description of a new taxon, extending the conventional written description and two-dimensional illustrations with an array of different information types. While this novel approach contributes to the different aspects of the species' identity, its main aim is to provide an integrated approach to handling and publishing large data sets associated with a taxon. The generation of large molecular and morphological data pools that originate from type specimens increases the applicability of taxonomic data in other scientific disciplines such as comparative morphology, evolutionary biology, medicine, ecology, and others.
The concept of a “cybertype” is discussed in the study, but at the same time new questions arise, pertaining to the definition of such a “cybertype” and they will have to be addressed by the taxonomy community. Several different kinds of data belonging to the “cybertype” concept are treated in this study, from free text to sequence data, and from images to volumetric data. Questions have to be addressed such as whether a cybertype should only be restricted to morphological data, what data can be used to constitute a cybertype and whether a cybertype can be composite (i.e. consisting of several data types) or even distributed (different parts of the data residing on different physical servers). Further problems to be addressed are the lack of appropriate mechanisms for cross-linking molecular and morphological data, as well as the absence of global metadata standards and reliable data repositories for micro-CT image libraries. The metadata descriptors for micro-CT files used by the Morphosource and GigaDB repositories are a good starting point for that, as is the use of ISA-TAB to integrate everything together. Whatever the answers to these questions, there is one mandatory requirement for data that we can already identify: discoverability and accessibility.
With complex taxon descriptions such as the present one, we are entering new dimensions of data volumes that have to be managed properly to realise their true value. The deposition of large data pools in appropriate repositories is not yet straightforward, and such initiatives have started to emerge only recently. It is our task to ensure from the beginning that they do not develop into isolated data worlds but that they support community standards, describing the datasets in a way that they can be retrieved and cross-linked. Currently, even in modern taxon descriptions, different pieces of data are only linked through a central locus: the published article. In a future, data-centric world of taxonomy, articles published through next generation journal workflows will become an even more important node in a linked network of data elements describing the taxon. These data elements have to be defined and made accessible through persistent identifiers – not unlike the traditional practice for physical specimens that are accessible through their museum accession number. In combination with rich metadata standards, taxonomy will thus open itself up to the semantic Web with its possibilities for intelligent, complex queries.
In this study, we have taken a first step towards that direction. All data have been deposited in publicly accessible repositories, such as GigaDB, NCBI, BOLD, Morphbank and Morphosource, and the respective open licenses used ensure their accessibility and re-usability. GigaDB in this example provides direct links between the genomic and micro-CT data, through a Darwin Core CSV dataset describing the type specimens, as well as capturing all of the metadata in the interoperable ISA-TAB format. Molecular data and images are annotated with rich metadata to ensure discoverability and reuse. Techniques such as micro-CT are, however, still in their infancy, and no standardised metadata schemas exist yet – a gap that needs urgently to be addressed by the community if we are to avoid a proliferation of isolated datasets.
Taxonomy is at a turning point in its history. New technologies allow for creation of new types of information at high speed and in gigantic volumes, but without clear rules for communication standards, we will not be able to exploit their full potential. We need to focus our efforts on linking these bits and pieces together, by documenting them, by standardising them and by making them retrievable. If such an infrastructure is in place, unforeseen analytical powers can be unleashed upon these data, creating a revolution in our abilities to understand and model the biosphere.
This project was developed in collaboration between several research institutions and driven by Pensoft Publishers, BGI-Shenzhen and GigaScience. We would like to thank Philippe Rocca-Serra and the ISA-Team for help in producing the ISA-TAB metadata. We are very grateful to Biodiversity Data Journal editor Bob Mesibov (Queen Victoria Museum and Art Gallery, Tasmania, Australia), and the referee Greg Edgecombe (NHM, London) for their constructive comments and useful suggestions that greatly improved the manuscript. Special thanks to Henrik Enghoff who facilitated AK and PS’ respective visits to the Natural History Museum of Denmark, financially supported by the European Commission’s (FP 6) Integrated Infrastructure Program SYNTHESYS (DK-TAF). All specimens were collected during cave fauna research projects conducted by the Croatian Biospeleological Society and funded by The Krka National Park. AK thanks all colleagues from the CBSS who assisted her in collecting the specimens. Stylianos Simaiakis (Natural History Museum of Crete) kindly provided material from the type locality of E. litoralis for DNA barcoding. Pensoft has received financial support by the EU FP7 projects ViBRANT (Virtual Biodiversity Research and Access Network for Taxonomy, www.vbrant.eu, Contract no. RI-261532) and pro-iBiosphere (Coordination & Policy Development in Preparation for a European Open Biodiversity Knowledge Management System, Addressing Acquisition, Curation, Synthesis, Interoperability & Dissemination, Contract no. RI-312848, www.pro-ibiosphere.eu). The BGI and GigaScience teams have received support from China National Genebank (CNGB). The DNA barcodes were obtained through the International Barcode of Life Project supported by grants from NSERC and from the government of Canada through Genome Canada and the Ontario Genomics Institute.
The archive contains the following data: 1) fasta-Alignment as the basis for all analyses (.FASTA), 2) mega-file for the calculation of the genetic distances and the NJ tree (.MDSX), 3) NJ-tree in Newick format (.NWK), 4) graph of the TCS Software for the Statistical Parsimony method (.GRAPH)