Hidden in our pockets: building of a DNA barcode library unveils the first record of Myotis alcathoe for Portugal

Abstract Background The advent and boom of DNA barcoding technologies have provided a powerful tool for the fields of ecology and systematics. Here, we present the InBIO Barcoding Initiative Database: Portuguese Bats (Chiroptera) dataset containing DNA sequences of 63 specimens representing the 25 bat species currently known for continental Portugal. For that, we sequenced tissues samples obtained in a vast array of projects spanning the last two decades. New information We added four new Barcoding Index Numbers (BINs) to existing Chiroptera barcodes on BOLD, two belonging to Myotis escalerai, one to Plecotus auritus and the other to Rhinolophus hipposideros. Surprisingly, one of the samples initially identified in the field as Myotis mystacinus turned out to be Myotis alcathoe, which represents the first record of this species for Portugal. The presence of Nyctalus noctula in Portugal was also genetically confirmed for the first time. This case study shows the power and value of DNA barcoding initiatives to unravel new data that may be hidden on biological collections.


Introduction
The barcoding of life is a booming initiative to catalogue worldwide biodiversity (Ratnasingham and Hebert 2007) wherein tens of thousands of species have already been referenced through the sequencing of a fragment of the cytochrome c oxidase I (COI) gene. These libraries have already been used in a vast array of studies ranging from the diet assessment of species (e.g. Mata et al. 2016), to the detection of rare species (e.g. Wilcox et al. 2013) or the community composition of habitats or ecosystems (e.g. Baselga et al. 2013, Clare et al. 2007). One of the potential uses of DNA barcoding has been the taxonomic validation of biological collections (Puillandre et al. 2012). Consequently, under DNA barcoding initiatives, a great number of new species or hidden cryptic diversity have been found (e.g. Lara et al. 2010, Saitoh et al. 2015, Corley et al. 2017, Corley and Ferreira 2017. These latter examples demonstrate the power of this technique on unveiling hidden patterns of diversity. In Portugal, the InBIO Barcoding Initiative was recently launched under the scope of EnvMetaGen ERA-Chair, aiming to contribute for building a DNA barcoding library for the country's biodiversity of terrestrial and freshwater ecosystems (Ferreira et al. 2018). Building on this initiative, we assembled a COI reference library for all the 25 bat species known to occur in mainland Portugal (Rainho et al. 2013), belonging to four families (Vespertilionidae, Rhinolophidae, Miniopteridade and Molossidae). During this process, a new bat species was discovered for the country -Myotis alcathoe (Helversen et al. 2001)raising the number of bat species for mainland Portugal to 26.

General description
Purpose: This dataset aims to provide a first contribution to an authoritative DNA barcode sequences library for Portuguese bats. Such a library should facilitate DNA-based identification of species for both traditional molecular studies and DNA-metabarcoding studies and constitute a valuable resource for taxonomic and ecological studies.

Additional information:
We obtained the full barcode sequence (COI -658 bp) for 63 specimens (Fig. 1, Table 1). Sequences are distributed in 26 Barcode Index Numbers (BINs), a system in which closely related sequences are clustered into operational taxonomic units (OTUs). Of these, four are unique, two of Myotis escalerai (ADS3148, ADT 1511), one of Plecotus auritus ( ADU1131) and one of Rhinolophus hipposideros (ADV3826). For five specimens, the field identification did not match the molecular data, most likely as a result of morphological misidentification (two Eptesicus isabellinus matched E. serotinus haplotypes, one Myotis emarginatus matched M. escalerai, one M. mystacinus matched M. alcathoe and one Pipistrellus pygmaeus matched P. pipistrellus). The analysed specimens of Myotis myotis and M. blythii shared the same COI haplotypes, probably due to a known widespread introgression of mtDNA (Afonso et al. 2017). It is also possible to observe the occurrence of two distinct haplogroups of Myotis escalerai, probably corresponding to the 'West' and 'North Central East' cytochrome-b haplogroups described by Razgour et al. (2015). Phylogenetic trees confirmed the presence of the previously known 23 bats species to mainland Portugal, plus the M. myotis/blythii complex ( Fig. 1), as well as the occurrence of the unrecorded M. alcathoe. This individual was collected in a protected area located in north-western Portugal (Peneda Gerês National Park; Figs 2, 3, 4). Our results also provide the first genetic confirmation of the presence of Nyctalus noctula in Portugal, whose occurrence in the country consists of isolated and sporadic observation events (this study; Barros 2012, Rainho et al. 2013 Bogdanowicz et al. 2012). This way, although we cannot fully discard the possibility of the discovered female individual to be a hybrid, these authors found an overall low level of introgression, as well as an asymmetric introgression pattern mediated mainly by males, thus making it unlikely that our bat was a M. mystacinus. Further genetic analyses using nuclear markers would be needed to fully validate the identity of the species, but congruence of morphological characters (lighter and brown fur) and ecological ones (foraging on a very cluttered riparian gallery) seem to further support the identity of our bat individual as M. alcathoe.
M. alcathoe is associated with dense riparian environments and was known to occur from northern Spain to Sweden and Turkey. Its known distribution is highly scattered and most likely full of knowledge gaps, mainly due to under-sampling of the habitats of this bat species and misidentifications during fieldwork (Bashta et al. 2011). The closest known record of the species is separated by more than 150 km from the Portuguese sample location (Hermida et al. 2012), thus our record is also the westernmost known record for the species. Records in Spain are mainly associated with streams within mature woodlands dominated by oaks (Agirre-Mendi et al. 2004). These species' populations are classified as Data Deficient by IUCN, with destruction and degradation of riparian forest and woodland identified as the main threats due to loss of roosts and foraging areas (Helversen et al. 2001). Of note, this species is classified as "Endangered" in Catalonia due to its rarity and pressures over riparian forests (Coronado et al. 2017). It is highly likely that the Portuguese populations may suffer from similar threats. Therefore, this species may be restricted to the northern forests of Portugal, although only through dedicated surveys will it be possible to characterise its distribution in the country and evaluate population status.   Hidden in our pockets: building of a DNA barcode library unveils the first ...
Our take-home message is that the screening of current and older collections, either museum or private, may withhold surprises that will further complete acknowledged species lists. With the ever-decreasing costs of barcoding techniques, it is expected that many researchers may afford this approach. Barcoding will most likely become an essential tool for the managing of collections. Additionally, vouchering of specimens, especially from regions with large knowledge gaps like tropical Africa and Southeast Asia, might help future studies aiming for pathogen discovery, integrative taxonomy, climate change, environmental pollution and other topics that might not constitute the initial focus of the sampling.

Project description
Title: The name "The InBIO Barcoding Initiative Database: Portuguese Bats (Chiroptera)" refers to the data release of DNA barcodes and distribution data of bats within the InBIO Barcoding Initiative.
Design description: Chiropteran specimens were collected in the field, morphologically identified and DNA barcoded.
Sampling description: Bat samples were collected under the scope of several projects spanning from 2005 to 2018 (Rebelo and Jones 2010, Santos et al. 2014. All bats were captured during mist-netting sessions or using harp-traps at roost exits. A non-lethal 3 mm wing punch was collected from several individuals and stored in 96% ethanol. Taxonomical identification of individuals during fieldwork was done according to the most popular identification keys of European bats (Dietz andHelversen 2004, Dietz et al. 2009).
Up to five specimens of each species were sequenced in the laboratory. DNA was extracted from wing punches, using the E.Z.N.A. Tissue Kit (Omega Bio-tek). Two partially overlapping fragments of the COI gene were amplified using the primers FwhF1 x Ind_C_R (325bp; Vamos et al. 2017, Shokralla et al. 2015 and BF2 x BR2 (423bp; Elbrecht and Leese 2017), modified to contain Illumina adaptors. PCR products were subject to a second amplification to attach indexing barcodes and P5/P7 adaptors, followed by bead clean-up, nanodrop quantification and normalisation. The final pool was quantified by qPCR and sequenced in a MiSeq platform using a v2 2x250 kit (~5000 reads/fragment/ sample). Bioinformatic analysis of raw reads was done using ObiTools (Boyer et al. 2015) and, briefly, consisted of pairwise alignment of reads, removal of primer sequences, collapsing of similar reads into haplotypes and removal of rare variants (low read count). Geneious 10.2.3 (http://www.geneious.com, Kearse et al. 2012) was used for final sequence assembly, while double checking for the occurrence of possible nuclear copies. Species ID was confirmed using BOLD System Identification Platform (http:// www.boldsystems.org). For each species, two representative sequences available in BOLD were retrieved and aligned with ours in order to build a phylogenetic tree. Haplotype alignments were analysed using the Maximum Likelihood (ML) method and ML trees were built in RaxML (Stamatakis 2014) with 1,000 bootstrap replicates and searching for the best-scoring ML tree.
Quality control: All DNA barcodes sequences were compared against the BOLD database and the 99 top hits were inspected in order to detect possible issues due to contaminations or misidentifications. Prior to GBIF submission, data were checked for errors and inconsistencies with OpenRefine 3.2 (http://openrefine.org).
Step description: Samples were collected from bats captured using mist-nets or harptraps at roost exits and identified morphologically by experts. A non-lethal 3 mm wing punch was collected from each individual and stored in 96% ethanol from where DNA was extracted and the COI DNA barcode fragment was sequenced. Prior to GBIF submission, data were checked for errors and inconsistencies with OpenRefine 3.2 (http:// openrefine.org).

Taxonomic coverage
Description: This dataset is composed entirely of data relating to 63 Chiroptera records.
Overall, 26 species are represented in the dataset (100% of the ones existing in continental Portugal and 83.8% of the ones existing in Iberia). These species belong to four families, the majority of which belong to the Vespertilionidae (20 species or 76.9%), with additional representatives from Rhinolophidae (four species) and a single species in the Miniopteridae and Molossidae. Vespertilionidae also accounts for over eighty percent (84.1%) of all collected samples, Rhinolophidae (9.5%), Miniopteridae (4.8%) and a single sample was collected from the Molossidae family. formats (data as dwc, xml or tsv and sequences as fasta files). Alternatively, BOLD users can log-in and access the dataset via the Workbench platform of BOLD. All records are also searchable within BOLD, using the search function of the database.The dataset, at the time of writing the manuscript, is included as Suppl. materials 1, 2, 3 in the form of two text files for record information as downloaded from BOLD, one text file with the collecting and identification data in Darwin Core Standard format (downloaded from GBIF) and of a fasta file containing all sequences as downloaded from BOLD. It should be noted that, as the BOLD database is not compliant with the Darwin Core Standard format, the Darwin Core formatted file (dwc) that can be downloaded from BOLD is not strictly Darwin Core formatted. For a proper Darwin Core formatted file, see http://ipt.gbif.pt/ipt/resource?r=ibi_chiroptera&v=1.1 (Suppl. material 2).