Biodiversity Data Journal :
Research Article
|
Corresponding author: Ywee Chieh Tay (yweechieh@gmail.com)
Academic editor: Anne Thessen
Received: 25 Sep 2019 | Accepted: 21 Nov 2019 | Published: 10 Dec 2019
© 2019 Yin Cheong Aden Ip, Ywee Chieh Tay, Su Xuan Gan, Hui Ping Ang, Karenne Tun, Loke Ming Chou, Danwei Huang, Rudolf Meier
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ip YCA, Tay YC, Gan SX, Ang HP, Tun K, Chou LM, Huang D, Meier R (2019) From marine park to future genomic observatory? Enhancing marine biodiversity assessments using a biocode approach. Biodiversity Data Journal 7: e46833. https://doi.org/10.3897/BDJ.7.e46833
|
|
Few tropical marine sites have been thoroughly characterised for their animal species, even though they constitute the largest proportion of multicellular diversity. A number of focused biodiversity sampling programmes have amassed immense collections to address this shortfall, but obstacles remain due to the lack of identification tools and large proportion of undescribed species globally. These problems can be partially addressed with DNA barcodes (“biocodes”), which have the potential to facilitate the estimation of species diversity and identify animals to named species via barcode databases. Here, we present the first results of what is intended to be a sustained, systematic study of the marine fauna of Singapore’s first marine park, reporting more than 365 animal species, determined based on DNA barcodes and/or morphology represented by 931 specimens (367 zooplankton, 564 macrofauna including 36 fish). Due to the lack of morphological and molecular identification tools, only a small proportion could be identified to species solely based on either morphology (24.5%) or barcodes (24.6%). Estimation of species numbers for some taxa was difficult because of the lack of sufficiently clear barcoding gaps. The specimens were imaged and added to “Biodiversity of Singapore” (http://singapore.biodiversity.online), which now contains images for > 13,000 species occurring in the country.
DNA barcoding, marine park, genomic observatory, COI, biocodes
In recent decades, it has become clear that biodiversity loss is an increasingly serious problem and many species are expected to become extinct before discovery and description (
Molecular techniques have dramatically increased the rate of species discovery and facilitated species identification for those species that have been barcoded. DNA barcoding was initially proposed as a means to identify animal species, although it is now increasingly used for species discovery (
For highly biodiverse regions such as Southeast Asia, these global reference databases remain particularly incomplete and poorly curated (
Singapore is situated just outside the southwest corner of the biodiverse Coral Triangle biodiversity hotspot and, like other countries in the region, its marine biodiversity remains relatively poorly understood. To address this shortfall, we describe the first results of a programme that aims to build a comprehensive animal species identification database for Singapore’s first marine park—the Sisters’ Islands Marine Park (SIMP; Fig.
Map depicting the intertidal and subtidal sampling sites in the Sisters’ Islands Marine Park (SIMP), Singapore.
Dotted lines define the SIMP’s boundaries. Number of sampling events per site are indicated within the sampling event icons in the inset map. Adapted from http://commons.wikimedia.org/wiki/File:Singapore_Outline.svg.
The work performed here will help consolidate sampling records and molecular data obtained from the SIMP will form an important baseline for monitoring Singapore’s marine species. It will also provide better and more complete understanding of marine biodiversity in Singapore, with further utility throughout Southeast Asia where work of this nature is still in its infancy and which is inadequately represented in global databases (
We first compiled existing records and published DNA barcodes relevant to SIMP through a literature keyword search for marine fauna found at the SIMP. Records for SIMP species predominantly came from two survey projects aimed at documenting and/or discovering local biodiversity (without DNA barcodes): (i) a large-scale ‘Comprehensive Marine Biodiversity Survey’ of Singapore (
Samples were collected from all four islands of the SIMP:
Collections were authorised by the National Parks Board (permit number NP/RP15-088) and were carried out at the accessible intertidal reef, sandy beach, seawall and much of the shallow subtidal reef areas (Fig.
Intertidal specimens were obtained using hand tools and nets during low spring tides, 0.0 m to 0.2 m above chart datum. These tools were likewise used for subtidal sampling via SCUBA diving to depths of up to 15 m. The search included around, under and inside potential hideouts. Any metazoans encountered during these visual surveys that were not already in our collection, were collected. Fish were collected using two ‘bubu’ traps, each measuring 0.072 m3, deployed twice, for periods of one day each, during the sampling period. Up to three individuals of each species were collected, avoiding gravid females and juveniles to reduce sampling impact on natural populations.
Samples were provisionally imaged in situ using a Canon Powershot G10 (Canon Inc., Japan) or Olympus Stylus Tough TG-4 compact camera (Olympus Corporation, Japan). In the laboratory, invertebrate specimens were relaxed in 7.5% (w/v) MgCl2 buffered in seawater (
For each large soft-bodied specimen, a small piece of tissue (20–40 mm3) was excised, while for each arthropod, one to two legs from the same side of the body were detached for DNA extraction. The tissues were digested overnight at 55°C in 900 μl CTAB (hexadecyltrimethylammonium bromide) with 0.4 mg proteinase K, after which DNA was purified by phase separation with phenol: chloroform: isoamyl-alcohol (25:24:1).
The COI gene region was amplified using different primer pairs described in
Most of the macrofaunal samples were subjected to Sanger barcoding. Each 12.5-μl reaction contained 0.5 μM of each primer (uniquely tagged primers for 46 samples only; untagged for the rest), 0.5 μg BSA (bovine serum albumin), 2 μl template DNA and 1× GoTaq®/BioReady rTaq DNA polymerase and reagents mastermix (v/v), according to the manufacturer’s recommendations. The thermal cycling profile for (1) using the Folmer primer pair was 94°C for 60 s; 35 cycles of denaturation at 94°C for 45 s, annealing at 48°C for 45 s, extension at 72°C for 90 s; and a final extension at 72°C for 3 mins. The thermal cycling profile for (2) and (3), using the Lobo primers, included a step-up annealing profile of 94°C for 60 s; 5 cycles of 94°C for 30 s, 48°C for 120 s, 72°C for 60 s; 35 cycles of 94°C for 30 s, 54°C for 120 s, 72°C for 60 s; and 72°C for 5 mins.
Successful PCR amplicons were purified using SureClean Plus (Bioline Inc., London, UK) and prepared for Sanger sequencing using the BigDye Terminator Cycle Sequencing Kit v. 1.1 and PureSEQ (Aline Biosciences), on an Applied Biosystems 3730XL DNA Analyzer (Thermo Fisher Scientific, U.S.A.), following the manufacturer’s instructions. COI barcodes, obtained via Sanger sequencing, were assembled and edited using Geneious R11 v11.0.2 (Biomatters Limited) (
Sampling was performed at sites 1, 3 and 4 listed in the macrofaunal field collection section. A vertical plankton tow with a 100-μm mesh net was used to collect micro- and mesozooplankton (
Samples were concentrated through a 100-μm sieve, preserved in 70% ethanol and stored at -30°C prior to sample sorting and imaging. Sorting and imaging were performed under a dissecting microscope (Leica S8 APO with Canon EOS 750D mounted; 1–8× magnification), using soft fine forceps. Specimen identification followed
For larger arthropods, one or two legs from the same side of the body were detached for DNA extraction. For specimens < 5 mm in size, whole individuals were either used for phenol-chloroform extraction or were incubated in 20 μl of 2×-diluted QuickExtract TM DNA extraction solution (Epicenter, BuccalAmp TM) , following the manufacturer’s instructions.
Forty-six macrofaunal samples, along with all zooplankton samples, were sequenced using high-throughput sequencing (HTS; Suppl. material
DNA barcoding via HTS ("HTS barcoding";
Four criteria (C1–4) were used to select barcodes that we considered reliable. C1: Zooplankton morphotypes and barcode identities were congruent and samples had a good match (≥ 97%; giving species level identity) to global databases. C2: Barcode had a poor match (> 85%; giving lowest taxonomic identity), but the match was consistent with the morphological sort. C3: In order to accommodate mistakes that may be made during the initial sort of zooplankton, we kept sequences for specimens, even when the morphotypes and barcode identities were incongruent as long as the BLAST match to an existing species in GenBank was high (≥ 97%) and the specimen images were consistent with the BLAST matches. C4: Specimens that failed to yield a barcode due to the violation of filtering thresholds were re-evaluated and retained when all of the following criteria were fulfilled:
All sequence data were aligned using MUSCLE 3.8.425 (
Sampling at Sisters’ Islands Marine Park (SIMP) yielded 931 specimens, comprising 564 macrofauna (benthic and fish) and 367 zooplankton specimens (Figs
Representative images of all sampled benthic and fish phyla except Acoelomorpha.
Sample numbers and corresponding museum codes (ZRC) are indicated, where available.
Representative images of major zooplankton morphotypes sampled.
Summary of specimen collection, DNA barcoding and species identification successes across phyla.
Overall, COI amplification success was 68.0% across all phyla (633 out of 931 samples). A total of 297 of the sample barcodes (46.9%) were ≥ 658-bp in length (long; average length 677-bp), while 336 samples (53.1%) had sequence lengths varying between 229- and 657-bp (short; average length 350-bp) (Suppl. materials
Amplification and sequencing success were variable across different phyla and primer pair combinations. Molluscs were generally easy to amplify, while echinoderms were challenging and required more PCR optimisation. Specifically, primer pairs in reaction mix (1) yielded approximately 50% amplification success, reaction mix (2) gave approximately 80% success and reaction mix (3) yielded the highest amplification success at ≥ 95%.
PCR amplification success for zooplankton samples was 70.6% (of 367 samples) and 259 tagged amplicons were sequenced using high-throughput sequencing (HTS) barcoding. Due to uneven amplicon pooling, data for only 191 of these amplicons were retrieved, for which 411,201 reads were demultiplexed and sequence quality filters resulted in coverage of 19 to 4,048 reads per barcode. Overall, sequencing success was moderate, with 174 out of 259 samples (67.2%) passing all filtering criteria. Of those failing the criteria, nine specimens were retained following criteria C4.
Well-studied and morphologically distinct groups such as corals, sea anemones, echinoderms, molluscs and crustaceans were easily recognised, but most specimens could not be identified to species (75.3% unidentifiable; only 230 specimens were identifiable to 155 species by morphology) without DNA barcode-assisted identification. DNA barcodes obtained for 633 specimens clustered into 351–395 species dependent on clustering criterion (i.e. MOTUs), of which 83 specimens were identifiable only via DNA barcode (48 species). This adds up to an approximately 36% increase in the number of specimens that could be delimited to at least species level.
The final set of 633 COI specimen barcodes obtained clustered into 351 molecular operational taxonomic units (MOTUs, i.e. putative species). This was based on a species delimitation threshold of 3%, which was defined by assessing the percentage pairwise differences across all sampled taxon groups (Suppl. material
In recent years, the process of species discovery has been enhanced with DNA barcoding approaches (
The morphological study of small animals and zooplankton is particularly time-consuming because large numbers of specimens are usually collected (e.g.
Morphological identification can be challenging, even for charismatic animals due to the presence of cryptic species. Our analyses revealed at least two pairs of morphologically indistinguishable species with high COI sequence divergence. These possibly sympatric cryptic species groups include two Ligia isopods with a 22% pairwise distance, as well as two Peronia slugs (Mollusca: Gastropoda: Onchidiidae) with a 5.4% pairwise distance. In the latter case,
Indeed, DNA barcodes can help with species delimitation and cryptic species detection. DNA barcodes also allow for obtaining abundance and distribution information, but they tend to be of limited value for their original purpose, i.e. species identification, as only 36.2% of barcodes obtained here had species-level matches. A substantial number of our sequences that were matched to GenBank sequences at < 90% identity yielded only very tentative genus- or family-level identities. Even well-studied and common taxa such as molluscs, arthropods and fishes (e.g. Hyselodoris, Dendrodoris, Ashtoret, Grapsus and Pomacentrus) lacked barcodes in GenBank. Furthermore, in some taxa, the genus-level identities were of questionable accuracy. For example, amongst Alpheus shrimps, up to 20 specimens were recovered in the incorrect lineage with < 90% identity, with the closest match being a Caridea sp. at 82% to 88% sequence similarity (Suppl. material
The inadequacy of the barcode databases was particularly problematic for understudied groups such as annelids, platyhelminths, poriferans and zooplankton, such as chaetognaths. Amongst the 17 barcoded platyhelminth flatworm samples, for instance, all GenBank matches were < 88% in sequence identity and accurate only to the phylum level for 14 samples, while seven samples were assigned to the incorrect genus (see also
Overall, our study confirms that a substantial number of the sequences in the global databases are misidentified and that one should carefully distinguish between the use of the barcode sequences for, for example, obtaining distributional data and the use of barcode identification in the database. This is particularly important for understudied taxa (
For more than a decade, the COI locus has been popularised for barcoding a wide range of metazoan species (
The large number of species from many divergent lineages, examined here, would typically require a wide range of taxonomic expertise to sort the specimens into putative species, based on morphological data. This expertise was not readily available, so we use molecular tools for rapid and cost-effective species delimitation (
Our work here is only the beginning of further molecular ecological work in this biodiverse region. It follows recent, successful, large-scale biodiversity sampling exercises, such as the Moorea Biocode Project (
To understand why this is advantageous, we note that survey windows at the SIMP are limited in the intertidal areas by the tidal regime and in the subtidal by strong currents, so rapid and non-intrusive sampling methods such as environmental DNA (eDNA) would enable more regular surveys (
The collection and barcoding of marine animals at Sisters’ Islands and the surrounding islands began more than two years ago, at a time when these locations were designated Singapore’s first marine park. This is part of a larger initiative to make Singapore’s biodiversity identifiable, as well as to provide molecular identification tools for future work. Despite only a collection frequency of 34 times over a span of two years on foot and via SCUBA across a large 40-ha area and using only simple hand tools, nets and traps, our study managed to sample more than 365 species across a wide range of marine animals. A more systematic sampling approach, covering a larger area and using grabs, trawls, dredges and various nets will uncover greater diversity and more taxa, including infaunal and meiofaunal groups.
Being able to quantify and identify species diversity is important for many reasons, including the provision of a community baseline against which future surveys can be compared (
We thank Chay Hoon Toh, Diego Pitta de Araujo, Jia Jin Marc Chang, Ria Tan and Yong Kit Samuel Chan for their help with specimen collections; Chay Hoon Toh, Nicholas Wei Liang Yap, Rene Ong, Siong Kiat Tan, Drs. Koh Siang Tan and Zeehan Jaafar and Professors Greg Rouse and Peter Ng for advice on species identification; Amrita Srivathsan, Arina Adom, David Jian Xiong Tan, Theodore Tze Ming Lee, Saravanan Nadarajan and Sze Min Charlene Mary-Anne Ng for bioinformatic, logistical and/or laboratory support; Kok Sheng Loh for in-situ imaging support; Jonathan Kit Lan Ho, Jun Bin Loo and Jia Jin Marc Chang for data clean-up and ex-situ imaging support; and Benjamin John Wainwright for valuable comments that helped improve the manuscript.
This research was supported by the National Parks Board (R-347-000-242-490) and SEABIG, NUS (R-154-000-648-646 and R-154-000-648-733).
National University of Singapore
No ethical violations or security breaches were made during the course of this study.
R.M., Y.C.T., K.T., D.H. and L.M.C. conceived the idea. R.M., Y.C.T. and D.H. designed the experiments. Y.C.A.I., Y.C.T., S.X.G., D.H. and H.P.A. conducted the fieldwork, while Y.C.A.I., Y.C.T., S.X.G. and D.H. conducted the experiments and analysed the data. Y.C.A.I., D.H. and Y.C.T. wrote the main manuscript and all authors actively revised and approved the manuscript.
The authors declare no conflicts of interest.
Existing published records documenting marine fauna at the Sisters' Islands Marine Park (SIMP), based on a keyword search of the literature. Literature keyword search was performed by entering keywords in the following order: “Sisters' Islands” or “Sisters' Islands Marine Park” or “Pulau Subar Laut” or “Pulau Subar Darat” or “Pulau Sakijang Bendera” or “Tanjong Hakim”.
Compiled literature of species records for which COI barcodes are available on GenBank. Species names from the compiled literature of SIMP biodiversity were searched on GenBank, returning 109 species with COI sequences that were sequenced elsewhere and these were compiled separately as a reference sequence database.
Information on macrofaunal specimen image availability, taxonomic information, genetic identity match to both GenBank and BOLD system databases, collection information, barcode availability, barcode length (long ≈ 658bp; short ≈ 313bp), LKCNHM Zoological Reference Collection (ZRC) catalogue numbers and GenBank numbers of all collected specimens.
Information on zooplankton specimen image availability, taxonomic information, genetic identity match to both GenBank and BOLD system databases, collection information, barcode availability, barcode length (long ≈ 658bp; short ≈ 313bp), LKCNHM Zoological Reference Collection (ZRC) catalogue numbers and GenBank numbers of all collected specimens.
Cluster dendrogram based on percentage pairwise differences in COI for all 632 specimens with COI barcodes. Values at the nodes represent the percentage pairwise difference between two specimens. Taxon names on the branches represent taxonomic identities, based on morphological identification.
Cluster dendrogram based on percentage pairwise differences in COI for 304 specimens (excluding IP0303) with species-level identification. Values at the nodes represent the percentage pairwise difference between two specimens. Taxon names on the branches represent taxonomic identities, based on morphological identification.