Species diversity and DNA barcode library of freshwater Molluscs of South Caucasus

Abstract This study provides the first attempt to investigate the molecular diversity of South Caucasian freshwater molluscs (Mollusca, Gastropoda) and lay down the first bricks to build up a DNA-barcode library. In total, 289 COI barcode sequences were obtained from 33 morpho-species belonging to 24 molluscan genera and 10 families that represent nearly 30% of known freshwater molluscan diversity of the South Caucasus region. DNA barcodes were analysed by means of the Barcode Index Number (BIN) and the other tools available in BOLD Systems. Results showed that the knowledge of freshwater molluscs diversity in the South Caucasus is far from comprehensive. For the studied 33 morpho-species, 289 barcodes were clustered into 40 BINs, from which unique BINs were defined for 12 species and five species were characterised with more than a single BIN. From the studied taxa, 60% were characterised larger than 2.2% sequence divergence indicating high genetic variation or cryptic diversity. Within our limited taxonomic coverage, we found one new species for the Republic of Georgia (Galbaschirazensis) and at least three undescribed species belonging to the genera Stagnicola, Segmentina and Anisus. Uniqueness and high molecular diversity of the studied species emphasise the need for further intensive morphological and molecular investigations of the South Caucasian freshwater molluscan fauna.


Introduction
Under increasing anthropogenic pressure, the conservation of freshwater biodiversity and maintaining freshwater ecosystem functioning remain two of the most critical challenges for the 21st century's world , Hoffmann et al. 2010. A sufficient knowledge of the species diversity and distribution of freshwater taxa is crucial for understanding the needs and implementation of conservation measures to save species and maintain ecosystem integrity (Collier et al. 2016). Freshwater molluscs constitute a diverse and functionally important component of freshwater communities (Runck 2007, Strong et al. 2007) inhabiting a wide range of freshwater habitats (Dillon 2000, Strong et al. 2007) and, at the same time, are the most vulnerable taxa amongst freshwater inhabitants (Cuttelod et al. 2011). Accurate biodiversity information on freshwater molluscs is often missing, especially in species-rich and economically poorly devolved parts of the world, hindering effective management and conservation activities. A good example is the Caucasus biodiversity hot-spot where, in spite of the recent advancements (e.g. Vinarski et al. 2014, Grego et al. 2020, Chertoprud et al. 2020, Chertoprud et al. 2021, Neiber et al. 2021, the knowledge about the diversity and distribution of freshwater molluscs is still far from being comprehensive . Most probably this is due to the absence of local taxonomic expertise during the last 50 years. Recent developments of DNA barcoding technology helped significantly to revive and advance the biodiversity inventory and monitoring at an unprecedented rate (Waugh 2007, Trivedi et al. 2016. DNA barcoding proved to be an effective tool in helping taxonomists to distinguish taxa and even confidently solve the taxonomic problems, especially when traditional (morphology -based) methods alone are failing (Hebert et al. 2003, Hajibabaei et al. 2006, Goldstein and DeSalle 2011, Sheth and Thaker 2017. Perhaps more importantly, DNA barcoding triggers even non-taxonomists and the young generation to put effort into biodiversity investigation (Packer et al. 2009, Ellis et al. 2010, Ebach 2011. For instance, in Georgia, a number of research projects have been conducted very recently investigating the freshwater biodiversity, including or exclusively being based on DNA barcoding approaches conducted by experienced and amateur scientists (Grego et al. 2020, Epitashvili et al. 2020). In addition, DNA barcoding (and in particular environmental DNA or eDNA meta-barcoding) is a promising tool in fast, non-invasive and cost-effective means for biodiversity inventory/monitoring (Thomsen et al. 2012, Carew et al. 2013, Thomsen and Willerslev 2015. However, in order to make DNA barcoding approaches useful tools, it is essential to build barcode reference libraries against which newly-obtained barcodes can be compared , Weigand et al. 2019. A barcode reference library is basically a data infrastructure that requires a routine input from both taxonomic and molecular experts. Currently, the largest reference library is available in BOLD systems (http://www.boldsystems.org) which is, on the other hand, less effective when dealing with taxa from poorly-investigated areas (Weigand et al. 2019). For instance, for the Caucasus region, barcode information is lacking for a great deal of taxa, including freshwater molluscs. In the present publication, we provide a first stage of an ongoing project that aims to build a DNA barcode reference library for South Caucasian freshwater molluscs within the framework of the Caucasus Barcode of Life initiative (https://ggbc.eu). In particular, the aim of the given study was to: (1) generate COI barcode sequences for a part of the freshwater molluscan taxa known for the region, (2) investigate within vs. between species sequence variation, (3) identify gaps in species-level taxonomic knowledge of freshwater molluscs and (4) develop subsequent research agenda.

Sample Collection
Sample collection campaigns were carried out from 2015 to 2021 across the various regions of Georgia (and also, to a lesser extent, in Armenia and Azerbaijan during 2019) (Fig. 1). The territory of Georgia is very rich with natural lotic and lentic water bodies and is, thus, hard to sample exhaustively. To provide a representative sampling scheme, we planned field collection trips for every municipal region of Georgia and, during each collection trip, we sampled as many independent water bodies as possible. For each geographic locality, we tried to do exhaustive sampling by checking all kinds of available habitats including river banks, springs (including subterranean), channels, lake littorals, mires and temporal water bodies, as well as subterranean habitats (caves and springs). Specimens were collected by sieving substrates from different types of microhabitats and also directly from the surfaces of water plants and fallen leaves, stones and sink logs. In addition and whenever possible, bottoms of lotic/lentic habitats were inspected with glass bottom viewing boxes for mussels of the family Unionidae. In case of Armenia and Azerbaijan, only a single (though long distance) per-country field collecting trip was arranged with the same field collecting principles. Samples were immediately preserved in 96% ethanol after collection. Sorting and taxonomic identification of individuals was conducted using the keys of Jackiewicz (1998), Glöer (2002), Soldatenko and Starobogatov (2004), Welter-Schultes (2012), Piechocki and Wawrzyniak-Wydrowska (2016), Glöer (2019) and .
One to ten specimens per morphologically defined species were selected for barcoding. In cases of genera -Radix and Ancylus for which the systematics of Caucasian taxa is not yet well understood, we took a larger number of specimens for each morpho-species. All selected specimens were first photographed according to BOLD standards (Milton et al. 2013) and, in the case of larger specimens, only a part of tissue was separated for DNA extraction, while, for small-bodied species (such as, for instance, Ancylus and most of Sphaeriidae), the soft body of the complete individuals was submitted for DNA extraction.
Here we must note that the family Hydrobiidae is the single exception that was not studied within the framework of the given article. The reason is that these prosobranch molluscs, which were known with only a few species until very recently (i.e. 5 years ago), proved to be highly diverse in the Caucasus region (Grego et al. 2020, Chertoprud et al. 2020, Chertoprud et al. 2021, and are currently under intensive taxonomic investigation. Due to a large, yet undescribed species diversity, we omit them from the current article.
Collected materials/vouchers are deposited in the collection of the Institute of Zoology of Ilia State University, Tbilisi under the respective CaBOL identification numbers given in Suppl. material 1.

DNA processing
Genomic DNA was extracted from tissue samples using the Quick-DNA™ Miniprep Plus Kit (Zymo Research) (for 25 mg tissue), Quick-DNA™ Miniprep Plus Kit (Zymo Research) (for 5 mg tissue) DNeasy Blood & Tissue Kits (Qiagen, Germany) according to the manufacturer's instructions and the protocol proposed by Sokolov (2000) with slight modifications (Sauer and Hausdorf 2009). Partial sequences of cytochrome oxidase c subunits I (COI) were amplified by polymerase chain reaction (PCR) using the primer pair LCO1490-JJ and HCO2198-JJ (Astrin and Stüben 2008). Thermal conditions included denaturation at 95°C for 1 min, followed by the first cycle set (15 cycles): 94°C for 30 sec, annealing at 55°C for 1 min (−1°C per cycle) and extension at 72°C for 1:30 min. Second cycles set (25 cycles): 94°C for 35 sec, 45°C for 1 min, 72°C for 1:30 min, followed by 1 cycle at 72°C for 3 min and final extension step at 72°C for 5 min. In addition, shorter COI sequences were amplified using the Folmer et al. (1994) forward (LCO1490) and Kuhn's reverse (LCO1491) primers (cited in Cordellier and Pfenninger 2008). PCR cycling Map of collection localities for freshwater molluscs in the present study. The red dots correspond to the localities from where one or more specimens/species were submitted to barcoding, while the yellow dots correspond to localities from where the specimens are still waiting for genetic investigation.
conditions were adopted from Wethington and Lydeard (2007) and were comprised of an initial denaturation step: 94°C for 3 min, followed by 30 cycles at 94°C for 40 sec, annealing temperature at 48°C for 1 min, 72°C for 1 min and final extension step at 72° for 10 min. Resultant amplicons were visualised on 1% agarose gels using 3 μl of PCR product. The remaining PCR products were then completed using Big Dye Terminator v.3.1 (Applied Biosystems, Foster City, CA, USA) and run on an automated sequencer. Some of the PCR products were sequenced at Macrogen Europe Laboratory (Amsterdam, The Netherlands). Both DNA strands of the PCR product were sequenced.

Data analyses
Sequences were edited in Geneious Pro v.7 (Drummond et al. 2011) to ensure the absence of indels and stop codons. Quality sequences (i.e. less than 1% base-pair ambiguity) were submitted to BOLD Systems (http://www.boldsystems.org) under the project acronym "GEOFM" including the specimen images, trace files and the rest of the metadata (Suppl. material 1). In addition, we ran a BOLD search for molluscan barcodes originating from the South Caucasus region and which were added to the "GEOFM" project under a dataset named "DS-FMOL" for part of the analyses.
Barcode Index Numbers (BIN) (Ratnasingham and Hebert 2013) were then automatically assigned to each newly-derived sequence by BOLD Systems v.4. That is a two-stage analysis where, at the first stage, an initial assignment of sequence to an Operational Taxonomic Unit (OTU) takes place, based on Refined Single Linkage Clustering (RESL) with a threshold of 2.2% sequence differences. In the second stage, graphical analyses (Markov clustering) are applied to OTUs. Which, in case of the existence of a clearly defined internal structure within OTU, can result in its split into two or more OTUs in spite of smaller (i.e. less than 2.2%) sequence divergence between OTUs (Ratnasingham and Hebert 2013). RESL algorithm and ABGD (Automatic Barcode Gap Discovery -Puillandre et al. 2011) were further employed to generate OTUs and cluster histograms via BOLD Systems.

Results
In total, 289 COI barcode sequences were obtained and uploaded in the "GEOFM" BOLD project, representing 33 species from 24 molluscan genera from 10 families. Prior to the present study, there were 47 freshwater molluscs COI barcode sequences available in the BOLD Systems (from the study area) including 11 sequences from an unpublished project within the "DNAqua-Net" COST Action The average fragment length of COI barcodes in the "DS-FMOL" dataset (combining "GEOFM" project plus pre-existing barcodes) was 534 bp (min: 409 bp and max: 658 bp). Nucleotide base frequencies were: A-25.4%, G-18.4, C-14.4%, T-41.8%) -similar to reported frequencies for molluscs (e.g. Weigand et al. 2011), while GC content equal to 32.8% was lowest compared to results from other molluscan studies (35.8% and 36.9% from Kumar et al. (2015) and Layton et al. (2014), respectively).
The families Planorbidae and Lymnaeidae are represented by the highest number of barcodes (116 and 99, respectively). The two families Unionidae and Neritidae are represented each with 19 and 12 barcodes, respectively. The two families Cyrenidae and Sphaeriidae are represented by an equal number of barcodes (each with 11 barcodes). The two families Physidae and Viviparidae are represented each with 10 and 5 barcodes, respectively and the family Melanopsidae and Acroloxidae by the lowest number of barcodes (three barcodes each). The most common genus was Ancylus, for which 93 barcodes (two species) were generated, followed by Radix and Unio (73 and 16 barcodes, respectively and three species for each of them). The 18 genera were represented by a single species, two genera with two species and a single genus by the four species. Of all species obtained, two species Ancylus sp. 2 and Radix auricularia were represented the highest number of barcodes (each with 89 and 52 , respectively), followed by Radix euphratica, Lymnaea stagnalis, Theodoxus fluviatilis, Corbicula fluminalis, Unio crassus and Physella acuta (each with 21, 14, 12, 11, 13 and 10 barcodes, respectively). Most of the species are represented with less than 10 barcodes, including six species, with a single barcode (Fig. 2).
The BIN and RESL analyses resulted in 41 BINs united into 40 OTUs. In addition, 13 OTUs were also formed for 23 sequences (all belonging to Hydrobiidae and mined from GenBank) for which no BINs had been defined due to the small barcode size (less than 500 bp (Ratnasingham and Hebert 2013)). From the 41 BINs, 32 (78%) were concordant and nine (22%) were represented with singletons. Sequences (107) of 18 BINs (42%) are only known from the study area at the time of publishing.
In most cases, morphologically determined specimens (comprising 28 species) were matched with a single OTU/BIN cluster with intraspecific (or within BIN) sequence divergence of less than 2.2%. More than one BIN were found in five species-level taxa -Planorbis planorbis (2 BINs), Physella acuta (2 BINs), Lymnaea stagnalis (2 BINs), Radix auricularia (2 BINs) and Radix euphratica (4 BINs) (  Ranking of species according to the number of barcodes.

Discussion
South Caucasian freshwater molluscs (and all invertebrates in general) are still poorly known ). The check-list of freshwater molluscs species for the South Caucasus or any separate country within it is more than 50 years old and completely outdated (Zhadin 1952, Javelidze 1973, Akramowski 1976 Grego et al. 2020). Nearly all morphologically identified species were further validated with barcode data, while several species turned out to be mismatches with the BOLD taxonomy. This latter category includes pond-snail species of the family Lymnaeidae, ramshorn snails (family Planorbidae) and freshwater clams (family Sphaeriidae). While the aim of this article is not to deal with the systematics and taxonomy of species, in the following, we will revise each of the studied taxa and outline gaps in the knowledge deemed for further in-depth study.
Pond snails of the family Lymnaeidae are distributed worldwide (Correa et al. 2011). They are of major medical and veterinary importance since they act as vectors of parasites (Bargues et al. 2006, Medeiros et al. 2014). The morphological and anatomical plasticity amongst and within lymnaeid representatives remains challenging (Bargues and Mas-Coma 1997, Jackiewicz 1998, Pfenninger et al. 2006, Aksenova et al. 2018; however, recent large scale multi-marker molecular genetics and morpho-anatomical investigations refined species-level taxonomy at least for a part of taxa within this family (Aksenova et al. 2018 Subfamily Lymnaeinae includes three representative genera in South Caucasus each with a single species. Galba is characterised by high phenotypic plasticity and extremely uniform anatomical traits, which are often the reasons for species misidentification (Samadi et al. 2000, Standley et al. 2013. Three of our specimens of Galba truncatula formed BIN BOLD:ABA2623 which represents the cluster of G. truncatula specimens from all over its distribution area. Distance to its NN BIN (BOLD:AAI7214) is 4.03% and is also named as G. truncatula. The single specimen (Samegrelo region, western Georgia) in our dataset (also morphologically identified as G. truncatula) clustered under BIN BOLD:AAY4012 comprising specimens of Galba schirazensis. The NN (with 7.84% divergence) BIN is BOLD:ADR2784 includes the specimens of Galba truncatula from Japan. A cryptic species -G. schirazensis was discovered relatively recently in different geographical regions throughout Europe, America and the Middle East, including Iran (Bargues et al. 2011). According to Kruglov (2005), G. schirazensis is already known from Azerbaijan -from a Caspian Sea Basin. For Georgia, it is a new country record. The specimens of G. schirazensis were collected from the western part of Georgia (Black Sea Basin), Orulu Village in Zugdidi District, Samegrelo -Zemo Svaneti Region (42.398926N, 41.739213E).
In this location, specimens were found amongst vegetation in a permanent stream. The water was shallow and slow running (Fig. 3).
From the genus Lymnaea a single species -L. stagnalis is known. Our specimens of L. stagnalis formed two BINs. Eight specimens matched with BIN BOLD:AEN6037, for which only a single barcode was available from Ukraine. The NN BIN is BOLD:ACQ0092 with 2.43% divergence, includes specimens also belonging to L. stagnalis. The remaining six specimens formed the unique BIN (BOLD:AEM9638) with the NN BIN -BOLD:ACQ2679 (with 2.12% divergence) comprising specimens of L. stagnalis. Thus, in South Caucasus, at least two haplotypes of L. stagnalis occur, both in a mountainous Javakheti region (southern Georgia). The last genera in this subfamily is Stagnicola which is also represented with a single species (S. palustris) in South Caucasus. Only two specimens of Stagnicola were represented in our dataset forming the unique BIN -BOLD:AEN6388 which were diverged by 4.83% from NN BIN BOLD:ACV7473, representing the specimens of S. turricula from Poland. Most probably the genus Stagnicola in Georgia (and in South Caucasus) is not a S. palustris or the genus is represented with more than one species in the region. Thus, additional sampling and taxonomic investigation are required.
The Ramshorn snails of the family Planorbidae is the most diverse group of freshwater pulmonates inhabiting a wide range of freshwater habitats (Jørgensen et al. 2004). Understanding of relationships within the Planorbidae remains confused due to the extreme variability of anatomical and shell morphological traits (Baker 1945, Hubendick 1978 The taxonomy of the genus of Ancylus is far from being resolved. For the Caucasus region, six species are indicated (Akramowski 1976, Soldatenko andStarobogatov 2004). For the present study, 104 specimens collected throughout Georgia and Armenia (that were initially identified as four morpho-species of A. benoitianus, A. capuloides, A. major and Ancylus sp. according to Soldatenko and Starobogatov 2004) were classified into two BINs. In particular, 12 specimens (Ancylus sp. 1) were defined under BIN BOLD:AEN7656 with 4.58% divergence to NN BOLD:AAD2028 and 92 specimens (Ancylus sp. 2) were defined under BIN BOLD:AAD2028 with 3.3% divergence to NN BOLD:ACZ3241. It is worth noting that neither of the above-mentioned BINs are properly named. The Caucasian Ancylus is characterised with a large number of lineages similar to those revealed in the Balkans (Albrecht et al. 2006) or in Germany (Weiss et al. 2018), thus reflecting the taxonomy of Soldatenko and Starobogatov (2004). However, overall genetic (and morpho-anatomical) differentiation might not be enough to delimit the species. Nonetheless, it is evident that Caucasian Ancylus is represented with a rather unique complex of lineages deserving further in-depth integrative taxonomic investigation.  (Piechocki 1989). However, according to Korniushin (2001) and Petkevičiūtė et al. (2018), there are several stable morphological and anatomical characteristics and, even more importantly, substantial genetic evidence that these two species are sister taxa. Due to observed genetic differences of our specimens to the corneum/nucleus group, it is worthwhile to investigate the South Caucasian Sphaerium representatives in more detail including multilocus phylogeny and morphology to solve its taxonomic affinities.
Another genus of clams with a complicated genetic structure is Euglesa. Specimens submitted to a barcoding pipeline were morphologically identified as either E. casertana (five specimens) or E. subtruncata (two specimens). The only specimen of putative E. subtruncata was validated under BIN: BOLD:ACQ3092, while the rest of the specimens formed unique genetic clusters with no clear systematic position. As an example, BIN BOLD:ACQ7011 contains specimens from Greece, Albania, Germany and one specimen from Georgia with a maximum intra-specific divergence of 1.71%. The closest NN BIN BOLD:AAG0350 (an unnamed clade) diverged with 1.92%. The remaining five specimens all turned out to belong to a yet unknown species under BINs BOLD:AEN6788 (5.13% divergence to NN) and BOLD:AEN0712 (3.8% divergence to NN BIN). Similar to Sphaerium, this genus is also difficult to classify, based on shell morphology alone due to limitations in taxonomically meaningful characters (Korniushin and Glaubrecht 2006, Voode 2017, Rassam et al. 2021. Accordingly, a more detailed study is necessary to solve species-level taxonomy and even to validate the taxonomic value of currently-used identification (morphological) characters for the species-level classification of Euglesa.
One more specious family in the study area is the bivalve family Unionidae that includes at least five valid species occurring in South Caucasus, including Unio crassus, U. tumidus, U. pictorum, Anodonta cygnea and A. anatina ( Graf 2007). In the present study, we sequenced representative specimens for all five species that perfectly matched with the conspecific barcodes from the BOLD system (Table 1). Similarly, specimens of other seven freshwater mollusc families, represented by a single species in the South Caucasus including Acroloxus lacustris (Acroloxidae), Physella acuta (Physidae), Bithynia tentaculata (Bithyniidae), Viviparus costae (Viviparidae), Melanopsis mingrelica (Melanopsidae), Theodoxus fluviatilis (Neritidae) and one bivalve species Corbicula fluminalis (Cyrenidae) also formed unambiguous barcode clusters matching the conspecific sequences originated outside the study area.

Conclusions
Our results clearly showed the insufficiency of the current knowledge of freshwater molluscs diversity in the South Caucasus region. In spite of the limited taxon coverage, nearly half of the studied taxa turned out to be in need of substantial taxonomic investigation/revision. In particular, nearly all genera with more than one known species are represented with regionally unique radiation and the species level taxonomy is inadequate. The South Caucasus region is considered a Plio-Pleistocene refugium and occurrence of unique or endemic lineages are not a surprise. However, a good understanding of its biodiversity is necessary to apply effective monitoring and conservation measures. In addition, the knowledge of the origin and phylogeography of most of the South Caucasian freshwater molluscs are generally missing (but see rare exception by Sands et al. 2019). Thus, obtained barcode data could pave the way to make further progress in this direction. A group of freshwater molluscs that were not investigated in the current project includes the representatives of the family Hydrobiidae -minute prosobranch snails. Only recently, this group turned out to be very species-rich in the South Caucasus (particularly in Georgia) (see, for instance, Grego et al. 2020). Although the systematics of this family in the South Caucasus is being studied by means of integrative approaches, still no quality barcodes are available for any of the species. Thus, diverse Hydrobiidae and some other freshwater mollusc families, for which only a sample of representatives have been studied until now, need to be further investigated in order to develop a useful barcode library. This particularly concerns the integrative taxonomic investigations to solve taxonomic ambiguities and clarify species-level diversity in the region.