Eupolybothrus cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae): the first eukaryotic species description combining transcriptomic, DNA barcoding and micro-CT imaging data

Abstract We demonstrate how a classical taxonomic description of a new species can be enhanced by applying new generation molecular methods, and novel computing and imaging technologies. A cave-dwelling centipede, Eupolybothrus cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae), found in a remote karst region in Knin, Croatia, is the first eukaryotic species for which, in addition to the traditional morphological description, we provide a fully sequenced transcriptome, a DNA barcode, detailed anatomical X-ray microtomography (micro-CT) scans, and a movie of the living specimen to document important traits of its ex-situ behaviour. By employing micro-CT scanning in a new species for the first time, we create a high-resolution morphological and anatomical dataset that allows virtual reconstructions of the specimen and subsequent interactive manipulation to test the recently introduced ‘cybertype’ notion. In addition, the transcriptome was recorded with a total of 67,785 scaffolds, having an average length of 812 bp and N50 of 1,448 bp (see GigaDB). Subsequent annotation of 22,866 scaffolds was conducted by tracing homologs against current available databases, including Nr, SwissProt and COG. This pilot project illustrates a workflow of producing, storing, publishing and disseminating large data sets associated with a description of a new taxon. All data have been deposited in publicly accessible repositories, such as GigaScience GigaDB, NCBI, BOLD, Morphbank and Morphosource, and the respective open licenses used ensure their accessibility and re-usability.


Introduction
While 13,494 new animal species were discovered by taxonomists in 2012 (Index of Organism Names), animal diversity on the planet continues to decline with unprecedented speed (Balmford et al. 2003). Changes and intensification of land use, habitat destruction, human population growth, pollution, exploitation of marine resources and climate change are among the major factors that lead to biodiversity impoverishment, and for the first time in human history, the rate of species extinction may exceed that of species discovery (Wheeler et al. 2012). The rapid pace of extermination has forced taxonomists to speed up the process of biodiversity investigation. The 'turbo-taxonomy' approach, combining molecular data, concise morphological descriptions, and digital imaging, has recently been introduced (Butcher et al. 2012, Riedel et al. 2013a) as one solution for the global loss of taxonomic expertise, part of the problem generally referred to as 'taxonomic impediment' (Wägele et al. 2011). Accelerated 'pipeline' descriptions of 178 new species of parasitic wasps (Butcher et al. 2012) and 101 new species of Trigonopterus weevils (Riedel et al. 2013b) were recently used to exemplify the concept.
Here, we present a more holistic approach to taxonomic descriptions. It is exemplified through a new cave centipede, Eupolybothrus cavernicolus Komerički & Stoev sp. n., recently discovered by biospeleologists in Croatia. To the best of our knowledge, this is the first time the description of a new eukaryotic species has been enhanced with rich genomic and morphological data, including a fully sequenced transcriptome, DNA barcodes, detailed X-ray micro-computed tomography scanning (micro-CT), and a video of a living specimen showing behavioural features. In this increasingly data-driven era, a further aim of this study is to set a new standard for handling, management and publishing of various data types. It is essential that data are easily accessible to researchers in every field of science, and able to be integrated from many sources, to tackle complex and novel scientific hypotheses. Rapid advances and increasing throughput of technologies such as phenotyping, genome-scale sequencing and meta-barcoding are now producing huge volumes of data, but there has been a lag in efforts to curate, present, harmonise and integrate these data to make them more accessible and re-usable for the community. Furthermore, by employing micro-CT scanning we test for the first time in a new taxon the recently introduced 'cybertype' notion (Faulwetter et al. 2013) of high-resolution virtual morphological and anatomical data libraries allowing reconstruction and interactive manipulation of type specimens.
To respond to the increasing interest in exposing and publishing biodiversity data (see e.g., Penev et al. 2011, Costello et al. 2013, Drew et al. 2013) and following the recent developments in open access data publishing (Smith et al. 2013) we also propose a novel workflow in the Biodiversity Data Journal of producing, storing, evaluating, publishing and disseminating complex data sets. The large-scale data handling, management and storage was provided by the GigaScience GigaDB database (see , with transcriptomic and annotation data made publicly available to the most stringent metadata standards in INSDC (NCBI/EMBL/DDBJ) databases, GigaDB and the relevant datatype specific repositories.

The study group
The subfamily Ethopolyinae Chamberlin, 1915 is known to comprise some of the largest lithobiomorphs in the world, with several species reaching 45-50 mm in length. At present, the subfamily includes four more or less well defined genera: Bothropolys Wood, 1862 with around 40 species from North America and East Asia;Archethopolys Chamberlin, 1925 with three species from the southwestern USA, Zygethopolys Chamberlin, 1925 with four species from western Canada and the USA, and Eupolybothrus Verhoeff, 1907 with 23 valid and 15 doubtful species and subspecies assigned to seven subgenera ranging from Southern Europe and North Africa to the Near and Middle East, including the largest Mediterranean islands Corsica, Sardinia, Sicily, Crete and Cyprus Edgecombe 2006, Zapparoli and. The genus Eupolybothrus exhibits the highest species diversity in the Italian and Balkan peninsulas (Zapparoli 2003), where a number of cave-dwelling species have restricted distribution ranges. A further 66 specieslevel taxa proposed in Eupolybothrus are currently considered to be junior synonyms, although their taxonomic status might change in the light of future taxonomic and molecular studies. The exact placement of genus Ethopolys Chamberlin, 1912, with twelve species in two subgenera from western Canada and the USA is uncertain, being treated in contemporary literature as either a synonym of Bothropolys Edgecombe 2006, Zapparoli and or a valid genus (Mercurio 2010).
While some species of Eupolybothrus and the genus itself have been treated recently in several publications (see e.g., Eason 1970, Zapparoli 1984, Zapparoli 1995, Zapparoli 1998, Zapparoli and Edgecombe 2006, Iorio 2008, Stoev et al. 2010, the other three genera, with few exceptions (e.g., Matic 1974, Ma et al. 2008, Ma et al. 2009, Ma 2012 have remained out of the scope of contemporary studies. Nevertheless, it is also far from being fully revised, as a number of problems are still in need of modern scrutiny. These mainly concern: 1) a high number of vaguely described or/and poorly known species and subspecies, mostly from the Balkans and Anatolia, known only from their original description; 2) an outdated subgeneric classification that lacks any phylogenetic framework; and 3) a high number of cryptic taxa in the E. nudicornis (Gervais, 1837), E. litoralis (L. Koch, 1867) and E. tridentinus (Fanzago, 1874) species-groups, as recently revealed by application of DNA barcoding (Porco et al. 2011). Further, Stoev et al. (2010) found high interspecific divergence values (20.8% mean value) between two closely related Eupolybothrus species in another barcoding study with mitochondrial Cytochrome C Oxidase subunit I (COI). Two other studies Giribet 2003, Spelda et al. 2011) contributed genomic data by analysing DNA barcodes for E. fasciatus and E. tridentinus from Italy and Germany, respectively. The present study is part of an ongoing revision of the subfamily Ethopolyinae (Stoev et al. 2010, Porco et al. 2011).

Collected material and morphological study
The present study is based on eight specimens of Eupolybothrus cavernicolus Komerički & Stoev sp. n. belonging to the Croatian Biospeleological Society (CBSS), the National Museum of Natural History, Sofia (NMNHS) and the Natural History Museum of Denmark (ZMUC). The specimens were preserved in ethanol (70 or 96%) or RNAlater (Qiagen, USA). The morphological study of the new species was performed at NMNHS and CBSS with a Zeiss microscope. For scanning electron microscopy (performed at ZMUC), parts of the specimens were cleaned by ultrasonification, transferred to 96% ethanol and then to acetone, air-dried, mounted on adhesive electrical tape attached to aluminium stubs, coated with platinum/palladium and studied in a JEOL JSM-6335F scanning electron microscope. Images were edited in Adobe Lightroom 4.3 and Adobe Photoshop CS 5. All morphological images have been deposited in Morphbank. Terminology for external anatomy follows Bonato et al. (2010).

DNA barcode sequencing
DNA extraction was conducted in the the Canadian Centre for DNA Barcoding, Guelph on complete animals or part of the leg of the specimens preserved in 96% ethanol. Standard protocols of the Canadian Centre for DNA Barcoding were used for both DNA extraction and amplification. All specimen data are stored in the Barcode of Life Data System (BOLD) online database and are available also in the dataset DS-EUPCAV (http:// dx.doi.org/10.5883/DS-EUPCAV), where they are linked to the respective Barcode Index Numbers clusters. This dataset contains sequences from ten species: E. cavernicolus Komerički & Stoev sp. n., E. leostygis (Verhoeff, 1899), E. obrovensis (Verhoeff, 1930), E. grossipes (CL Koch, 1847), E. gloriastygis (Absolon, 1916), E. nudicornis, E. litoralis, E. kahfi, E. transsylvanicus (Latzel, 1882 and E. tridentinus. In addition, all sequences were registered in GenBank (accession numbers KF715038-KF715064, HM065042-HM065044, HQ941581-HQ941585, JN269950, JN269951, JQ350447, JQ350449), one sequence of E. fasciatus (Newport, 1845) was recovered from GenBank (accession number AY214420). Two sequences from two Lithobius species were included as outgroups: L. austriacus (Verhoeff, 1937) (MYFAB442-11) andL. crassipes L. Koch, 1962 (MYFAB443-11). The final dataset comprises 39 sequences. Molecular delimitation of species was achieved by the implementation of the Automatic Barcoding Gap Discovery (ABGD) procedure as described in Puillandre et al. (2012) and by the reversed Statistical Parsimony (SP) approach as suggested by Hart and Sunday (2007). A Neighbor-Joining (NJ) tree was built for visualization.
For the ABGD method, we tested various model combinations to cross-check the obtained results: relative gap with (X) ranging from 0.05 to 1.5, minimal intraspecific distance (Pmin) of 0.001 and maximal intraspecific distance (Pmax) ranging from 0.02 to 0.11. Pmin and Pmax refer to the genetic distance area where the barcoding gap should be detected, whereas X defines the width of the gap. Distance calculation was based on the Kimura-2parameter model and a transition/transversion ratio of 2.0. The method was performed in 100 steps. Statistical Parsimony networks for the delineation of species were reconstructed on the basis of 95% statistical confidence (i.e. connection probability) using the program TCS 1.21 (Clement et al. 2000). The NJ-topology was calculated in MEGA 5.0 (Tamura et al. 2011) using the K2P-model under the pairwise-deletion option and 1000 bootstrap replicates. Intra-and interspecific genetic K2P-distances were calculated in MEGA 5.0 as well.

Transcriptome sequencing
One entire adult male specimen of Eupolybothrus cavernicolus Komerički & Stoev sp. n. was crushed and preserved in liquid RNAlater (Qiagen, USA) immediately after being captured. To extract total RNA, TRIzol reagent (Invitrogen, USA) was used according to the manufacturer's instructions. Messenger RNA (mRNA) was isolated from total RNA using a Dynabeads mRNA Purification Kit (Invitrogen, USA). The mRNA was fragmented and transcribed into first-strand cDNA using SuperScript™II Reverse Transcriptase (Invitrogen, USA) and N6 primer (IDT). RNase H (Invitrogen, USA) and DNA polymerase I (Invitrogen, Shanghai China; New England BioLabs) were subsequently applied to synthesize the second-strand of the cDNA. The double-stranded cDNA then underwent end-repair, a single 'A' base addition, adapter ligation, and size selection, indexed and PCR amplified to construct a library. The extracted cDNA was utilised for library construction with an insert size of 250 bp. Finally, the library was sequenced on the Illumina HiSeq2000 sequencing platform (Illumina, Inc., San Diego, California, USA) at BGI-Shenzhen using a 150bp pairend strategy to generate a total of 2.5 Gb raw reads. Illumina HCS1.5.15.1 + RTA1.13.48.0 were applied to generate a "bcl" file which was then downloaded to local computers. Secondly, the "bcl" file was converted to qseq format using BclConverter-1.9.0-11-03-08. Finally, we separated individual sample data from multiplexed machine runs based on the specific barcode primer sequences, and converted the file format to fastq.

Micro-CT scanning
The micro-CT scanning of one adult female specimen was performed at Bruker microCT, Kontich, Belgium, using a SkyScan 1172 system with the following settings: 40kV, 0.43°r otation step, acquiring 839 projection images from 360°with a pixel size of 8µm. Prior to scanning, the sample was dehydrated in graded ethanol: 50%, 70%, 90%, 100%, for 2 hours in total, and then transferred to HMDS (hexamethyldisilasane) for 2 hours, and air dried. Reconstruction was done with the SkyScan software NRecon, using a modified Feldkamp algorithm, and adjusting for beam hardening and applying ring artefact correction resulting in 3865 cross sections in .bmp format, with image size 2000x2000 pixels. The video of 3D volume renderings was created with CTVox, using the flight recorder function, and saved as an AVI (Audio Video Interface) file. The obtained data were processed through a transfer function where the different voxels with different grey value were (or weren't) made opaque and where the color was assigned to a certain grey value. The image stack is stored in GigaDB ) under a Creative Commons CC0 public domain waiver. The only software used was CTVox, a viewing software, not analysis software (although you could argue that viewing the images is also a way of analyzing them).

Description
Description of holotype: Body length: approx. 30 mm (measured from anterior margin of cephalic plate to posterior margin of telson); leg 15 -22.6 mm long, or 75% length of body.
Color: uniformly yellow-brownish to chestnut, margins of cephalic plate slightly darker than inner parts (Fig. 1).
Head: cephalic plate broader than long (4.0 x 3.6 mm, respectively), as wide as T1 (Fig. 2a); surface smooth, with several minute scattered pits, setae generally absent, except for a few emerging from the marginal ridge (above ocelli) and on the median sulcus. Cephalic median sulcus contributing to biconvex anterior margin, marginal ridge with a median thickening; posterior margin straight or slightly concave; transverse suture situated at about 1/3 of anterior edge; posterior limbs of transverse suture rd Figure 1. Komerički & Stoev sp. n., male paratype, ex situ. visible, connecting basal antennal article with anterior part of ocellar area. Ocelli: 1+14 blackish, irregular in shape, in 3-4 rows, outermost first seriate ocellus largest, ocelli of the middle two rows medium-sized, those of inferior row smallest (Fig. 2b). Tömösváry's organ: moderately large (as large as a medium ocellus), oval, situated on subtriangular sclerotisation below the inferiormost row of seriate ocelli (Fig. 2b). Clypeus: with a cluster of 25-30 setae situated on the apex and near the lateral margin (Fig. 3a). Antennae: right antenna composed of 71 articles, left antenna damaged after 61 article; slightly surpassing posterior margin of T11 (right) or T9 (left) when folded backwards, basal 2 articles enlarged, less pilose; posterior 30 articles visibly longer than broad, ultimate article approx. 1.3 times longer than penultimate one (Fig. 3b). Forcipules: coxosternite subpentagonal (Fig. 4a), shoulders almost absent (steep), lateral margins straight; anterior margin set off as a rim by furrow; coxosternal teeth 8 +8, median diastema well-developed, V-shaped, steep and narrow, porodont arising from a pit below the dental rim, situated lateral to the lateralmost tooth; base of porodont thinner then adjacent tooth, coxosternite sparsely setose anteriorly; setae moderately large, irregularly dispersed ( Fig. 4b). Forcipular trochanteroprefemur, femur and tibia and proximal part of forcipular tarsungulum with several setae. Distal part of forcipular tarsungulum about 3 times longer than proximal part (Fig. 4a).  covered with setae; laterally, on both sides of the central setose area there are two specific seta-free regions (Fig. 6a, sfa). All tergites smooth, setae present only on their lateral margins.  Legs: leg 15 longest; leg 14 approx. 25% longer than legs 1-12, leg 13 only slightly longer than legs 1-12; pretarsus of legs 1-14 with expanded fundus, larger posterior accessory claw (approx. 1/3 of fundus) and a slightly thinner and shorter anterior accessory claw (= spine, sensu Bonato et al. 2010) (Fig. 6b); pectinal (seriate) setae missing on tarsi 1 and 2 of leg 15, present in one short row on tarsus 2 of leg 14, and in one row on tarsus 1 and two rows on tarsus 2 of legs 1-13 (Fig. 7a); pretarsus of leg 15 without accessory spines (Fig. 7b). Length of podomeres of leg 15: coxa 1.5 mm, prefemur 3.7 mm, femur 4.0 mm, tibia 5.2 mm, tarsus 1 5.0 mm, tarsus 2 3.0 mm, pretarsus 0.25 mm. Prefemur of leg 15 with a large apically rounded proximal knob (Fig. 8) protruding mediad, latter slightly bent dorsad and bearing a peculiar cluster of long setae on tip (Fig. 9a); posterior edge with well defined circular protuberance at mid-distance between spines a and p dorsally, covered with long setae (Fig. 9b), rest of prefemur covered with sparse setae. Dorsal spine p on prefemur (but also in other podomeres and other legs) with characteristic bi-and tripartite tip (Fig. 10a). Legs 1-14 without particular modifications. Coxal pores: generally round, arranged in 4-5 irregular rows, pores of inner rows largest, size decreasing outwards; pores separated from each other by a distance more than, or equal to their own diameter; number of pores on leg- Sternites: all sternites smooth, subtrapeziform, with few sparse setae, mainly at lateral margins; posterior margins straight.

Habitus of Eupolybothrus cavernicolus
Genitalia: posterior margin of male first genital sternite deeply concave, up to half of its length, posterior margin densely covered with long setae, the rest of sternite sparsely covered with shorter setae; gonopod small, hidden behind the edge of first genital sternite, with 4-5 short setae (Fig. 11).
Plectrotaxy: as in Table 1.   Head: cephalic plate broader than long (3.9 x 3.5 mm, respectively), as wide as anterior part of T1; surface smooth, with several pits scattered throughout the head and giving rise to trichoid setae. Cephalic median sulcus contributing to biconvex anterior margin, marginal ridge with a median thickening; posterior margin slightly concave; transverse suture situated at about 1/3 of anterior edge; posterior limbs of transverse suture visible, connecting basal antennal article with anterior part of ocellar area. Ocelli: 18 blackish, subequal in size, in 3-4 rows. Tömösváry's organ: moderately large (as large as or slightly larger than a medium ocellus), oval, situated slightly above the cephalic edge below the inferiormost row of ocelli. Clypeus: with a cluster of about 25 trichoid setae situated on the apex. Antennae: approx. 22 mm long, composed of 67 articles, reaching the middle of T10 when folded backwards, basal 2 articles enlarged, less setose; posterior 30 articles visibly longer than broad, ultimate article approx. 1.3 times longer than penultimate one. Forcipules: coxosternite subpentagonal, shoulders almost absent, lateral margins straight; anterior margin set off as a rim by furrow; coxosternal teeth 7+7, median diastema well-developed, Vshaped, subparallel and narrow, porodont arising from a pit below the dental rim, situated lateral to the lateralmost tooth; base of porodont thinner then adjacent tooth, coxosternite sparsely setose anteriorly; setae moderately large, irregularly dispersed. Medial side of forcipular trochanteroprefemur, femur and tibia and proximal part of forcipular tarsungulum setose. Distal part of forcipular tarsungulum about 3 times longer than proximal part.
Tergites: T1 wider than long, subtrapeziform wider anteriorly, posterior margin slightly concave; TT3 and 5 more elongated than T1, posterior margin slightly concave medially, posterior angles rounded; T2 almost entirely covered by T1, only posteriormost part surpassing the margin of T1; posterior margin of TT4 and 6 straight, posterior angles abruptly rounded; T7 rectangular, posterior margin straight, posterior angles abruptly rounded; T8 approx. 1.4 times longer than T7, posterior margin of T8 slightly concave medially, angles abruptly rounded; TT9, 11, 13 with a well-developed posterior triangular projections; TT10 and 12 subequal in size, approx. 1.2 times longer than T8, posterior margin slightly emarginated; posterior margin of T14 slightly emarginated, surface smooth, posterior-most part covered with just a few trichoid setae (much more setose in male, see Fig. 6a); intermediate tergite subpentagonal, posterior margin deeply emarginated, surface smooth, lateral edges bent upwards, a few trichoid setae emerging from the posterior and lateral edges; areas covered with spines and setae, as well as the specific setose free areas present in male (Fig. 6a, sfa) absent.
Legs: leg 15 longest, leg 14 latter approx. 25% longer than legs 1-12, leg 13 only slightly longer than legs 1-12; pretarsus of legs 1-14 with a more expanded fundus, larger posterior accessory claw (approx. 1/3 of fundus) and a slightly thinner and shorter anterior accessory claw (= spine, sensu Bonato et al. 2010); pectinal (seriate) setae missing on tarsi 1 and 2 of leg 15, present in one short row on tarsus 2 of leg 14, and in one row on tarsus 1 and two rows on tarsus 2 of legs 1-13; pretarsus of leg 15 without accessory spines. Leg 15 slender and elongate, without particular modifications. Bifurcated spines present irregularly on most podomeres. Coxal pores: generally round, forming 4-5 irregular rows, pores of inner rows largest, size rd decreasing outwards; pores separated from each other mostly by a distance more than or equal to their own diameter.
Sternites: subtrapeziform in shape, anterior part wider; lateral sides straight in all but ultimate sternite, where they are slightly convex; sternite surface smooth, shining, covered with a few sparse setae, mainly at lateral margins.

Diagnosis
The species can be readily distinguished from all other congeners by the following set of molecular and morphological characters: interspecific genetic distance in COI from the closest neighbour, E. leostygis: 14.5-15.4%; antennae moderately long (approx. 70% body length), comprised of 67-71 articles; 11-15 ocelli; 6+6-8+8 coxosternal teeth; tergites 9, 11, 13 with posterior triangular projections; intermediate tergite subpentagonal, posterior margin deeply emarginated, middle part of posterior third of tergite densely covered with setae; laterally, on both sides of the central setose area, there are two specific seta-free regions; pretarsus 15 without accessory spines; leg 15 long (approx. 70-75% body length), prefemur of male leg 15 with a large, apically rounded proximal knob protruding mediad, latter slightly bent dorsad and bearing a cluster of long setae on tip; distal part of prefemur with a well-defined circular protuberance covered with setae; posterior margin of male first genital sternite deeply emarginated, nearly as deep as half of the sternite's length.

Etymology
Cavernicolus means "living in caves or caverns", to emphasise that the species inhabits caves.

Description of the type locality
Eupolybothrus cavernicolus Komerički & Stoev sp. n. is so far known only from the caves Miljacka II and Miljacka IV (= Špilja kod mlina na Miljacki), situated near the village of Kistanje, Krka National Park, Knin District, Croatia (Fig. 12). The two caves are situated close to each other and are formed in Middle Eocene to Early Oligocene conglomerate and marbly limestone. Miljacka II is the longest cave in the Krka National Park, with a large, spacious entrance and a total length of over 2800 m (Fig. 13)

Transcriptome analysis and annotation
The raw data was first filtered by removing inadequate reads with: 1) adapter contamination; 2) >10 Ns; 3) >50 base pairs of low quality (quality value <65). The resulting 2 Gb of clean data were processed into subsequent assemblies using SOAPdenovo_trans (Xie et al. 2013) under default parameters. The abundance information was provided directly by SOAPdenovo_trans, and played no roles in the subsequent analysis steps. A total of 67,785 scaffolds were produced with an average length of 812 bp and N50 of 1,448 bp [see GigaDB ]. Subsequent annotation was conducted by tracing homologs against currently available databases, including Nr, SwissProt and COG. Using this method, 22,866 scaffolds were functionally annotated (Fig. 20a, b, c). Annotated genes were then translated to peptide sequences via CDS prediction according to their blast results using GeneWise (Birney et al. 2004) (see GigaDB ). Using orthoDB (http://cegg.unige.ch/orthodb6) (Waterhouse et al. 2012), 2,188 one to one orthologs were filtered out from four selected arthropod genomes: Drosophila melanogaster Meigen, 1830, Daphnia pulex (Linnaeus, 1758), Ixodes scapularis Say, 1821 and Strigamia maritima (Leach, 1817). HaMstR (Ebersberger et al. 2009) was applied to search corresponding orthologous genes in our transcriptome data, delivering 1,668 Figure 19.
Delineation of Eupolybothrus species -Neighbor joining tree K2P distances. Visualised are the clusters obtained from the reversed Statistical Parsimony (SP) method and the Automatic Barcoding Gap Discovery (ABGD) procedure. Bootstrap support for the identified lineages are given above. The intraspecific genetic variability is given for each cluster. Source data is available in Suppl. material 1.
predicted orthologs of both nucleotide and protein sequences (see GigaDB ).

Taxonomic affinities
According to the division of the subgenera of Eupolybothrus of Jeekel (1967), E. cavernicolus Komerički & Stoev sp. n. falls into subgenus Schizopolybothrus Verhoeff, 1934, characterized by the presence of triangular projections on tergites 9, 11, 13, a VCm spine on leg 15, one or more VCa spines and a single claw on the pretarsus of leg 15. The same author further distinguishes three species groups in the subgenus based on the morphology of male gonopods and presence/absence of modifications on leg 15: • Group I, characterized by short male gonopods and presence of a large knob on male prefemur 15, currently including E. caesar (Verhoeff, 1899), from Bosnia-Herzegovina, Albania, mainland Greece (incl. Ionian Is.) and Macedonia (FYROM); E. spiniger (Latzel, 1888), from Bosnia-Herzegovina; E. acherontis (Verhoeff, 1900), from Bosnia-a b c Figure 20.
Gene annotation. Original data available from GigaScience GigaDB ).
a: E-value, identity and species distribution statistics of the sequences that can find homologs on Nr database b: COG functional classification of the transcripts c: GO categories of the transcripts Herzegovina (E. a. acherontis) and Macedonia (FYROM) (E. a. wardaranus (Verhoeff, 1937)); E. stygis (Folkmanova, 1940), from Bosnia-Herzegovina; and E. leostygis, from Croatia and Bosnia-Herzegovina (see Kos 1992, Stoev 1997, Zapparoli 2002. Here also belongs a new cave-dwelling species from Velebit, Croatia, recently discovered by AK and the CBSS team, whose description is currently in progress. While E. caesar and E. leostygis have recently been validated and re-described (see Eason 1983, Zapparoli 1984, Zapparoli 1994, the status of the other four taxa remains uncertain (see e.g., Stoev 2000, Stoev 2001, Stoev et al. 2010 • Group II, lacking any specific modifications on male legs while gonopods are also short, encompassing E. tabularum (Verhoeff, 1937) from the Western Alps and E. excellens (Silvestri, 1894) from the Ligurian Apennines. • Group III, characterized by the long gonopods and dorsal furrow on male prefemur 15, with E. zeus (Verhoeff, 1901)  Jeekel's division of the genus (Jeekel 1967) is quite artificial and does not reflect real evolutionary relationships as it is merely based on a few morphological traits. Some species were certainly misplaced in these groupings, as for example E. excellens, of which, males show noticeable modifications on leg 15 (see Fig. 17b). Two other species, E. zeus and E. sissii, were even excluded from Schizopolybothrus (cf. Zapparoli 1994, Zapparoli 2002. Showing a prominent prefemoral knob on male leg 15 and having relatively short gonopods, E. cavernicolus Komerički & Stoev sp. n. unquestionably belongs in Group I, as defined by Jeekel (1967). The new species can be readily distinguished from other members of Eupolybothrus (Schizopolybothrus) by the presence of a large proximal knob surmounted by a characteristic cluster of setae, and distal setose protuberance of male prefemur 15. In addition, the species presents a different arrangement of spiniform setae on the intermediate tergite.

Micro-computed tomography and 'cybertype' notion
The new generation imaging technologies, such as magnetic resonance imaging (MRI) and micro-computed tomography (micro-CT) are opening new horizons in biology (Mietchen et al. 2008, Ziegler et al. 2008. Micro-CT is becoming widely used in comparative, developmental and functional biology (see e.g., Metscher 2009a, Metscher 2009b, Wojcieszek et al. 2012, paleontology (Błażejowski et al. 2011, Edgecombe et al. 2012, molecular biology (Metscher and Müller 2011) and taxonomy (Faulwetter et al. 2013, Michalik et al. 2013. By employing micro-CT scans in taxonomy, important morphological and anatomical characters can be examined in their natural position without damage to the original specimen. This allows researchers to re-assess character shape and functionality or even discover new diagnostic characters (Ziegler et al. 2010, Zimmermann et al. 2011, Faulwetter et al. 2013. To make type material continuously and simultaneously available to taxonomists and to improve access to high-quality morphological data, Faulwetter et al. (2013) suggested the creation of high-resolution virtual morphological and anatomical data libraries allowing reconstruction and interactive manipulation of type specimens, the socalled 'cybertypes'.
The 'cybertype' notion is herewith tested for the first time with the newly described taxon (Fig. 21), for which a rich image library has been created to allow its subsequent recognition, virtual manipulation and reuse. This image library, from which the 3D model is created, has been deposited in the GigaScience database, GigaDB, as a zip and a gzipped tar archive containing BMP images ). The 3D model was converted into an AVI file, using the flight recorder of CTVox, and disseminated, along with the video of the living specimen (Fig. 22) through YouTube. According to Faulwetter et al. (2013), a 'cybertype' should be linked to the original type material and be retrievable and freely accessible. We comply with these requirements by a) including a set of Darwin Core files along with the deposited volumetric data which describe the attributes and deposition of the physical type material and b) using a CCZero license and rich metadata to make the "cybertype" retrievable and reusable. Furthermore, through the same set of Darwin Core files, the morphological data are also linked to the transcriptomic data at GigaDB, effectively extending the 'cybertype' concept and providing direct links to other data describing type material of the same species.

Data management and release
Whereas a lack of reference genomes in non-model organisms has hampered genetic and phylogenomic studies, transcriptomes may present a time and cost-effective substitute to whole genome sequencing for these types of studies and an efficient way to produce massive amounts of gene sequence data. While transcriptomic studies of centipede species, e.g. Alipes grandidieri Lucas, 1864 (Chilopoda; Scolopendromorpha), exist in the literature (Riesgo et al. 2012), centipede genome data in public and accessible repositories are still scarce and difficult to find. To address this deficiency, and to produce a model of an accessible resource for the community, all of the transcriptomic data have been made available under the highest metadata standards, both in relevant community specific databases (raw data in the SRA [SRA project accession: ERP003841] and transcriptomic in ArrayExpress [accession E-MTAB-1859]), as well as GigaDB .
GigaDB collects together all of the genomic and morphological data, and utilises the large computing infrastructure of the BGI and Aspera data transfer capabilities, able to host and deliver much larger and heterogeneous datasets than other repositories (Sneddon et al. 2012). Datasets are also issued with DOIs, which are discoverable through the DataCite metadata search engine and Thomson Reuters Data Citation Index, and can be integrated into a publication or independently cited.
In addition to making data publicly available, it is crucial to provide rich metadata to enable data interoperability and reuse. As there is only one transcriptome available, it is not possible to include additional 'factor' information. However, by including sequence reads, experimental design, protocols and processed data we were able to produce the maximum amount of (4*) MINSEQE compliant metadata (Brazma 2009). To maximise its interoperability, the metadata are also available from GigaDB in ISA-TAB format (Sansone et al. 2012).  For volumetric data created by techniques such as micro-CT and micro-MRI, no community standards exist yet. The DICOM standard (Digital Imaging and Communications in Medicine, http://dicom.nema.org/) used by the medical community is not tailored for taxonomic purposes, thus its usefulness for this research field still has to be investigated (Faulwetter et al. 2013). However, even in the absence of widely accepted standards, we provide rich metadata for the micro-CT data, based on the metadata descriptors at Morphosource (http://morphosource.org). The same set of descriptors has been used by GigaDB, where we also applied the ISA-TAB format in order to ensure reusability and interoperability of the data (Sansone et al. 2012), describing all parameters and settings used to create the data. The data package of micro-CT deposited at GigaDB thus contain: • MicroCT image stack available in 2 different formats: • Several ZIP files, each contains 500 bmp images, the scanning log documentation file and Darwin Core type specimen data. • A single gzipped TAR archive of all 3876 bmp images, the scanning log documentation file and Darwin Core type specimen data. • Documentation of the scanning and reconstruction process through ISA-TAB metadata provided by GigaDB and the inclusion of the scanning log file with the 'cybertype'. • Specimen data in Darwin Core format and link to the location of the physical material and the transcritomic data through Darwin Core comma-separated value format (CSV) files: • Eupolybothrus_cavernicolus_sequenced_vaucher_ paratype.csv • Eupolybothrus_cavernicolus_micro-CT_vaucher_ paratype.csv • Eupolybothrus_cavernicolus_all_types.csv • ISA-TAB metadata that ensure retrievability and interoperability.
In combination with the Darwin Core files describing the specimen data, we thus fully annotate and document the 'cybertype' of Eupolybothrus cavernicolus Komerički & Stoev sp. n. The generation of large molecular and morphological data pools that originate from type specimens increases the applicability of taxonomic data in other scientific disciplines such as comparative morphology, evolutionary biology, medicine, ecology. The new holistic approach raises important questions and shows up new directions for developments of biodiversity data management about the lack of mechanisms for cross-linking molecular and morphological data and global metadata standards for micro-CT and transcriptomic data, as well as absence of reliable data repositories for micro-CT image libraries.
Also, as a pilot project, we annotate all currently valid Eupolybothrus (Schizopolybothrus) species with their original descriptions that were extracted from the original publications through applying optical character recognition (OCR) and additionally tagged by using Golden Gate software (Sautter et al. 2007). All species treatments are deposited at Plazi. This represents part of a more ambitious project aiming at digitization of all species descriptions and important taxonomic treatments of Eupolybothrus species that is currently being carried out in the framework of the pro-iBiosphere project (http://www.proibiosphere.eu).
To create reliable links between the published sequence IDs and BOLD, an online dataset DS-EUPCAV was generated in the BOLD system, through which the respective Barcode Index Numbers (BINs) of the specimens barcoded for this study may be tracked (for the BIN concept see Ratnasingham and Hebert 2013). All COI sequences were registered in GenBank, following a newly launched metadata standard in the GenBank taxonomy database that flags sequences of type specimens.

Conclusions
This study demonstrates a holistic approach to the description of a new taxon, extending the conventional written description and two-dimensional illustrations with an array of different information types. While this novel approach contributes to the different aspects of the species' identity, its main aim is to provide an integrated approach to handling and publishing large data sets associated with a taxon. The generation of large molecular and morphological data pools that originate from type specimens increases the applicability of taxonomic data in other scientific disciplines such as comparative morphology, evolutionary biology, medicine, ecology, and others.
The concept of a "cybertype" is discussed in the study, but at the same time new questions arise, pertaining to the definition of such a "cybertype" and they will have to be addressed by the taxonomy community. Several different kinds of data belonging to the "cybertype" concept are treated in this study, from free text to sequence data, and from images to volumetric data. Questions have to be addressed such as whether a cybertype should only be restricted to morphological data, what data can be used to constitute a cybertype and whether a cybertype can be composite (i.e. consisting of several data types) or even distributed (different parts of the data residing on different physical servers). Further problems to be addressed are the lack of appropriate mechanisms for cross-linking molecular and morphological data, as well as the absence of global metadata standards and reliable data repositories for micro-CT image libraries. The metadata descriptors for micro-CT files used by the Morphosource and GigaDB repositories are a good starting point for that, as is the use of ISA-TAB to integrate everything together. Whatever the answers to these questions, there is one mandatory requirement for data that we can already identify: discoverability and accessibility.
With complex taxon descriptions such as the present one, we are entering new dimensions of data volumes that have to be managed properly to realise their true value. The deposition of large data pools in appropriate repositories is not yet straightforward, and such initiatives have started to emerge only recently. It is our task to ensure from the beginning that they do not develop into isolated data worlds but that they support community standards, describing the datasets in a way that they can be retrieved and cross-linked. Currently, even in modern taxon descriptions, different pieces of data are only linked through a central locus: the published article. In a future, data-centric world of taxonomy, articles published through next generation journal workflows will become an even more important node in a linked network of data elements describing the taxon. These data elements have to be defined and made accessible through persistent identifiers -not unlike the traditional practice for physical specimens that are accessible through their museum accession number. In combination with rich metadata standards, taxonomy will thus open itself up to the semantic Web with its possibilities for intelligent, complex queries.
In this study, we have taken a first step towards that direction. All data have been deposited in publicly accessible repositories, such as GigaDB, NCBI, BOLD, Morphbank and Morphosource, and the respective open licenses used ensure their accessibility and re-usability. GigaDB in this example provides direct links between the genomic and micro-CT data, through a Darwin Core CSV dataset describing the type specimens, as well as capturing all of the metadata in the interoperable ISA-TAB format. Molecular data and images are annotated with rich metadata to ensure discoverability and reuse. Techniques such as micro-CT are, however, still in their infancy, and no standardised metadata schemas exist yet -a gap that needs urgently to be addressed by the community if we are to avoid a proliferation of isolated datasets.
Taxonomy is at a turning point in its history. New technologies allow for creation of new types of information at high speed and in gigantic volumes, but without clear rules for communication standards, we will not be able to exploit their full potential. We need to focus our efforts on linking these bits and pieces together, by documenting them, by standardising them and by making them retrievable. If such an infrastructure is in place, unforeseen analytical powers can be unleashed upon these data, creating a revolution in our abilities to understand and model the biosphere.