NGS-based biodiversity and community structure analysis of meiofaunal eukaryotes in shell sand from Hållö island, Smögen, and soft mud from Gullmarn Fjord, Sweden

NGS-based biodiversity and community structure analysis of meiofaunal eukaryotes in shell sand from Hallo island, Smogen, and soft mud from Gullmarn Fjord, Sweden


Introduction
Microscopic interstitial marine organisms, also termed 'meiofauna', are often defined as animals that pass a 1mm mesh but are retained on a 45 µm sieve (Higgins 1988). Meiofauna are an important component of sedimentary and benthic habitats due to their small size, abundance and rapid turnover rates. Moreover, meiofaunal surveys represent a useful tool for environmental impact assessments, underlying the urgent need for reliable, reproducible and rapid analytical methods. The breadth of taxonomic groups present in marine sediments makes meiofauna an ideal tool for detecting the effects of ecological impacts on marine biodiversity (Moreno et al. 2008). However, traditional morphology based taxonomy assignment methods are labour intensive and time consuming, leading us to explore recently developed metabarcoding methods for whole community analysis. Metabarcoding has previously been used to characterize plankton assemblages (Lindeque et al. 2013, de Vargas et al. 2015, marine benthic meiofaunal assemblages , Fonseca et al. 2014, Brannock and Halanych 2015, Cowart et al. 2015, meiofaunal communities colonizing autonomous reef monitoring structures (Leray and Knowlton 2015) or fish gut contents (Leray et al. 2013). The vast majority of studies have employed Roche 454 due to its long read lengths compared to other technologies (Table 1; Shokralla et al. 2012), but Illumina MiSeq is now able to provide similarly long reads using paired-end sequencing (2x300 base pairs). As summarized in Table 1, there is no standardized method for metabarcoding of marine fauna, and a variety of sample extraction methods, sequencing platforms, molecular markers, bioinformatics pipelines and OTU clustering thresholds have been used to date, making these studies difficult to compare (Table 1) In this study we used samples from muddy and sandy marine sediments to examine how results of metabarcoding based surveys of meiofaunal communities are impacted by three different meiofaunal extraction methods and three different primer pairs for COI and 18S. In order to validate the reliability of the metabarcoding approach, we compare the results obtained with traditional morphology-based taxonomic assignment for two test groups, Xenacoelomorpha and Nematoda, the latter previously shown to be the dominant taxon in meiofaunal communities in terms of number of OTUs ).

Sampling
Samples were collected in two ecologically distinct locations along the west coast of Sweden in August 2014.

Meiofaunal extraction
Hållö island. Hållö island samples were extracted in the lab using two different variations of the flotation (decanting and sieving) technique.

Flotation (freshwater):
Freshwater was used to induce an osmotic shock in meiofaunal organisms and force them to detach from heavy sediment particles. 200 mL of sediment were placed in a large volume of fresh water and thoroughly mixed to suspend meiofauna and lighter sediment particles. The supernatant was sieved through a 1000 µm sieve to separate the macrofaunal fraction, which was then discarded. The filtered sample was sieved again through a 45 µm sieve to collect meiofauna and discard fine organic particles. This procedure was repeated three times. Meiofauna was then rinsed with seawater from the sieve into large falcon tubes. Twelve sediment samples were processed, ten of them were fixed immediately in 96% ethanol for molecular analysis and stored at -20°C. The other two samples were first screened for live representatives of Xenacoelomorpha, and later preserved in 4% formaldehyde for morphology-based identification of nematodes.
Flotation (MgCl2 solution): A 7.2% solution of MgCl2 was used to anesthetize meiofauna. As above, twelve samples were processed in total, ten of them were decanted through 125 µm sieve and fixed immediately in 96% ethanol for molecular analysis and stored at -20°C, while two samples were decanted through a 125 µm sieve which was subsequently placed in a petri dish with seawater. After 30 minutes, the petri dish as well as the inside of the sieve were searched for Xenacoelomorpha using a stereo microscope. Afterwards they were preserved in 4% formaldehyde for morphology-based identification of nematodes.
Gullmarn Fjord. Meiofauna was extracted from the Gullmarn Fjord samples using two different methods: flotation and siphoning.

Flotation (freshwater):
Freshwater was used to induce an osmotic shock in meiofaunal organisms. 2.4 L of sediment were placed in a large volume of freshwater, thoroughly mixed to suspend meiofauna and lighter sediment particles. The supernatant was sieved through a 1000 µm sieve in order to separate macrofauna, which was then discarded. The filtered sample was then sieved three times through a 70µm sieve to collect meiofauna and discard fine organic particles. Meiofauna was then rinsed with seawater from the sieve into a large container and equally divided between 12 falcon tubes. Six samples were fixed in 96% ethanol for molecular analysis and stored at -20°C. Six samples were screened for live representatives of Xenacoelomorpha, and preserved in 4% formaldehyde for morphology-based identification of nematodes.
Siphoning: A total volume of 12 L of sediment was processed as follows: an approximately 5 cm thick layer of mud was placed in a container and covered with 20 cm of seawater. The sediment was allowed to settle for 20 hours. Half of the sediment area was then siphoned through a 125 µm sieve, the residue in the sieve was immediately fixed in 96% ethanol, large macrofauna was manually removed, and the entire volume was split equally into six samples and placed at -20°C for subsequent molecular analysis. The remaining half of the area was similarly siphoned through a 125 µm sieve, the sieve contents were stored in sea water, large macrofauna manually removed, the entire volume split into six samples, which were screened for live representatives of Xenacoelomorpha, and preserved in 4% formaldehyde for morphology-based identification of nematodes.

Morphology-based identification
Xenacoelomorpha. Four samples from Hållö and 12 samples from Gullmarn Fjord were used for morphology-based assessment of the diversity of Xenacoelomorpha. All samples were stored in seawater and searched for Xenacoelomorpha with a stereo microscope. All specimens found were immediately identified to the lowest taxonomic rank possible using a compound microscope equipped with DIC.
Nematoda. Two samples from each location/extraction method were used to assess nematode diversity using morphology-based identification. Samples from Hållö (flotation with fresh water and MgCl2) and Gullmarn Fjord (siphoning) were processed whole and samples from Gullmarn Fjord extracted using flotation with fresh water were subsampled by taking 1/10 of the entire sample. Formaldehyde-preserved samples were transferred to glycerin using Seinhorst's rapid method as modified by De Grisse (1969). Permanent nematode mounts on glass slides were prepared using the paraffin wax ring method. It is common practice to estimate the diversity of marine nematodes by counting a predetermined number (usually 100 or 200) of randomly picked nematodes per sample (Vincx 1996), which may not provide sufficiently detailed results for samples with high diversity. Therefore, all nematode specimens were counted and identified for each analyzed sample. All nematode specimens were identified to genus, and, when possible, to species level.

DNA extraction, library preparation and sequencing
DNA extraction. 30 samples were processed for total DNA extraction, twelve from the Gullmarn Fjord and eighteen from Hållö island, using 10g of sediment and the PowerMax Soil DNA Isolation Kit (MO BIO Laboratories), according to manufacturer's instructions.
Primer design. Illumina MiSeq reagent v3. produces paired-end reads of 300bp in length, allowing a maximum marker length of 500bp when taking into account a 50 bp overlap. Universal COI primers available for the Metazoa amplify a 658bp region (Folmer et al. 1994), which is too long for most NGS applications.
Accordingly, primers amplifiying a 313 bp fragment of the mitochondrial cytochrome oxidase 1 (COI) gene were used, as described in Bourlat et al. 2016  Amplicon PCR. PCR amplifications of the COI and 18S regions were set up as follows. For a 50µl reaction volume, we used 5µl Pfu polymerase buffer (10x), 1µl dNTP mix (final concentration of each dNTP 200µM), 0.5 µl of each primer at 50 pm/µl, 2 µl DNA template (~10 ng), 0.5µl Pfu DNA polymerase (Promega) and 40.5µl of nuclease free water. Each DNA sample was amplified with the 3 primer pairs described above (COI Leray, COI Lobo and 18S). PCR cycling conditions were 2 min at 95°C (1 cycle); 1 min at 95°C, 45 s at 57°C , 2 min at 72°C (35 cycles); 10 min at 72°C (1 cycle). The PCR was checked on a 2% agarose gel. 20µl of each PCR reaction were then purified with Agencourt AMPure XP paramagnetic beads (Beckman Coulter), allowing size selection of PCR fragments by using different PCR product to bead ratios (Bourlat et al. 2016).
Index PCR. For dual indexing we used the Nextera XT index kit (96 indices, 384 samples, Illumina) according manufacturers' instructions. Dual indexing allows an increase in the multiplex level of sequencing per lane, so that more samples can be sequenced on the same flow cell (Fadrosh et al. 2014). It also eliminates cross-contamination between samples and the occurrence of mixed clusters on the flow cell (Kircher et al. 2012). The index PCR was set up as 50µl reactions using 5µl of cleaned up PCR amplicons, 5µl of ® ®
Number of reads per marker and per sequencing run

Paired-end joining
Demultiplexed MiSeq paired-end reads were joined using the Qiime script multiple_join_paired_ends.py using the fastq-join tool ( https://code.google.com/p/ea-utils/ wiki/FastqJoin). Data from three sequencing runs were merged producing a total of 24 132 875 raw paired-end reads, 15 883 274 reads for the COI dataset and 8 249 601 reads for the 18S dataset (Table 3). The number of reads remaining after various bioinformatic data processing steps is presented in Table 4. After paired-end joining, 48% of sequences were lost leading to a total of 12 543 198 reads, due to an observed decrease in sequence quality at the end of the reads, resulting in a bad overlap between the paired-ends. This loss is much more important for the longer 18S region (2 131 102 reads after joining, corresponding to a 74% loss) than for the COI region (10 412 096 reads after joining, corresponding to a 34,5% loss). Primer trimming and quality filtering

Marker / Step Raw data
Dual indexes and Illumina overhangs were removed by the sequencing platform. COI and 18s primer sequences were removed using a custom python script designed for this study ( https://github.com/Quiterie90/Primer_Removal). The script retains and trims reads that have the exact sequence of the forward and reverse primers at the beginning and at the end of the reads respectively, while other reads not meeting these criteria are discarded. The script takes into account the presence of ambiguous bases in the primer sequence (such as W, R, S, Y, M, K, H, D, B and V). In the case that an unassigned base (N) is found in the primer sequence, the read is also discarded. The primer-trimming step resulted in 9 171 378 reads remaining corresponding to a 27% loss. As the script is quite stringent, it quality filters reads by removing incomplete reads or chimeras. At this step 1 071 871 reads remained after trimming for the 18S dataset corresponding to a 50% loss and 8 099 507 reads remained after trimming for the COI dataset corresponding to a 22% loss. A quality filtering step was then carried out using the Qiime script multiple_split_ libraries_fastq.py to remove reads with a Q Score inferior to 30 (corresponding to a base call accuracy of at least 99,9%). A total of 2% of sequences were lost after the qualityfiltering step leading to 8 992 523 reads remaining. 5% of the reads were lost in the 18S dataset corresponding to a final 1 015 874 reads and 1,5% of the reads were lost in the COI dataset corresponding to a final 7 976 649 reads. Table 4.
Number of reads remaining after each bioinformatic step NGS-based biodiversity and community structure analysis of meiofaunal eukaryotes ...
For clustering sequences into Operational Taxonomic Units (OTUs) we used CROP, a Bayesian clustering algorithm that delineates OTUs based on the natural distribution of the data, using a Gaussian mixture model (Hao et al. 2011). The program allows the user to define a lower and upper bound variance to cluster the sequences, instead of a fixed sequence similarity value. According to a benchmarking study by Leray et  Parameters used in CROP for the analysis were as follows: CROP -i -b 160 000 -z 470 -l 3 -u 4 -o CROP -i -b 18 000 -z 470 -l 1.5-u 2.5 -o The 7 954 017 COI sequences and the 890 370 18S sequences were clustered into 2805 and 1472 representative OTUs respectively, 213 of which were identified to species for COI and 243 of which were identified to species for 18S, using a 97% sequence similarity threshold (

Taxonomic assignment
As Qiime is normally used for metagenomic analyses of prokaryotes, default databases are not suited for taxonomic assignment of Metazoa. Custom databases consisting in a taxonomy file associated with a reference sequence file can be created, or alternatively, a preformatted database such as the Silva database (http://www.arb-silva.de/no_cache/ download/archive/qiime/) can be used. For the COI region, a custom database of 1 947 954 sequences was created consisting of the BOLD database (http:// www.boldsystems.org/ downloaded on October 8 2015), combined with own reference databases of Nemertea, Xenacoelomorpha and Oligochaeta and barcodes of Swedish Echinodermata, Mollusca, Cnidaria and Arthropoda from the Swedish Barcode of Life database (SweBol). For the 18S rRNA region, a custom database of 732 419 reference sequences was created using the Silva database release 111 (http://www.arb-silva.de/ no_cache/download/archive/qiime/) and own barcodes for Acoela and Oligochaeta. Corresponding tab-delimited taxonomy files were created including a sequence ID and taxonomic lineage information (Phylum, Class, Order, Family, Genus and Species) derived from BOLD, Swebol, Silva and WoRMS (http://www.marinespecies.org/).
Taxonomic assignments were carried out using both 80% and 97% sequence similarity thresholds, to obtain identifications at phylum and species levels respectively (Giongo et al. Percentages of metazoan phyla uncovered in the samples using COI and 18S molecular surveys. Blue bars correspond to the cumulated frequencies of OTUs assigned to a specific phylum using the COI gene and red bars correspond to the cumulated frequencies of OTUs assigned to a specific phylum using the 18S gene. Taxonomic assignment is based on a 97% sequence similarity threshold. Note on the taxonomic assignement of Nematodes: The output from the Qiime analysis included 145 18S OTUs assigned to the phylum Nematoda. Three of them (HE1.SSU866120, HE6.SSU382930 and HF6.SSU331569) Suppl. material 1 were incorrectly placed among the nematodes due to errors in the reference database they derived from -they group among Arthropod taxa by the Megablast search and were excluded for that reason. Another OTU (TS6.SSU559982) is placed among Phoronida by the Megablast search and was also excluded. Two more sequences that were assigned to Nematoda appear to have long insertions within conserved regions (HE6.SSU358113 and TF5.SSU411806). Both of them were found only in one sample each, further supporting the idea that they are derived from erroneous amplification product, and were removed from any further analysis.

Diversity analyses
Alpha and beta diversity analyses were carried out with and without unassigned OTUs for both COI and 18S datasets. Unassigned OTUs were removed using the Qiime script filter_otus_from_otu_table.py. Alpha diversity (species richness) was calculated using the nonparametric Chao1 index using rarefied datasets to correct bias in species number due to unequal sample size. One of the samples in the COI dataset was removed prior to rarefaction analysis due to low sequence number (1122 sequences including unassigned OTUs and 280 sequences excluding unassigned OTUs at 97% sequence similarity) using the Qiime script filter_sample_from_otu_table.py. Rarefaction, alpha diversity calculation and generation of plots were performed using the Qiime scripts i) multiple_rarefactions.py, ii) alpha_diversity.py, iii) collate_alpha.py and iv) make_rarefaction_plots.py. Rarefaction was done to a depth corresponding to the total number of sequences in the smallest dataset (20405 sequences including unassigned OTUs and 5442 sequences excluding unassigned OTUs at 97% sequence similarity for COI, and 7561 sequences including unassigned OTUs and 5399 sequences excluding unassigned OTUs at 97% sequence similarity for 18S). Alpha diversities were compared between locations and extraction methods for both datasets and COI primer sets using the Qiime script compare_alpha_diversity.py. The script performs Monte-Carlo permutations to determine pvalues.
Beta diversity was calculated using the abundance-based Bray-Curtis index for both COI and 18S datasets. The Qiime script beta_diversity_through_plots.py was used to compute beta diversity distance matrices from the rarefied samples and generate Principal Coordinate Analysis (PCoA) plots. Beta diversity was compared according to location, extraction method and primer pair both with and without the unassigned OTUs using the Qiime script compare_categories.py. The script uses R and the vegan and ape libraries to compute statistical tests. We performed ANOSIM (ANalysis Of SIMilarity) tests, which are nonparametric, through 999 permutations. This method tests whether two or more groups of samples are significantly different by taking as null hypothesis that there is no difference between the two or more groups studied.
Alpha and beta diversities were calculated including and excluding the unassigned OTUs and results obtained were similar. Here we present plots including the unassigned OTUs (Figs 5, 6).

Data resources
The data underpinning the analysis reported in this paper are deposited at the GenBank SRA under project number PRJNA388326 (https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA388326). The large numbers of unassigned OTUs reflect the incompleteness of the databases used for COI and 18S. When unassigned OTUs are disregarded, differences between the taxonomic ocverage of the markers can be observed (Fig. 2, B and D). COI is the 'standard' animal barcode and is thus mostly useful for diversity surveys within the Metazoa (Hebert et al. 2003). 18S has on the other hand much larger taxonomic coverage and can be used for biodiversity profiles of whole eukaryotic communities, at higher taxonomic scales.

Results and discussion
Of all OTUs classified as Metazoa, a detailed breakdown per phylum is presented in Table  5 and Fig. 3

Meiofaunal community composition differs according to location
Taxonomic community composition at both locations surveyed is illustrated in Fig. 4. The bar plots in Fig. 4 take into account the read counts for each OTU, whereas Table 5 and Fig. 3 do not take these into account.
In Fig. 4, clear differentiation in biodiversity between the two habitat types (soft mud versus coarse shell sand) can be observed, as expected. Echinodermata (such as Ophiurida, Echinoidea and Asteroidea), Mollusca (Bivalvia, Gastropoda), Annelida and Arthropoda are represented by higher numbers of reads in samples from the muddy sediments in the Gullmarn fjord samples (grain size 100 μm approx.).
In coarse shell sand in shallow areas, such as in the Hållö island samples, Annelida and Arthropoda are represented by higher numbers of reads, followed by Chordata (cephalohordata such as Branchiostoma sp., ascidians and various fish species such as Gobius sp., Ctenolabrus rupestris, Solea solea) with in addition a larger diversity of small taxa such as Bryozoa, Gnathosthomulida, Gastrotricha, Tardigrada, Rotifera, Sipuncula and Phoronida, reflecting the high diversity of insterstitial taxa found in sandy sediments.

Sample diversity and composition analyses
A greater number of phyla were uncovered in the Hållö Island samples than in the Gullmarn Fjord samples ( Fig. 4A and 4B) and this observation was corroborated by the alpha diversity rarefaction plots showing that Hållö Island samples (in red) present a higher diversity than the Gullmarn Fjord samples (in blue) (p-value = 0.001) regardless of the marker used ( Fig. 5A and 5B). Within the same location, choice of extraction method does not have a significant impact on sample diversity (p-value ~ 1) ( Fig. 5C and 5D, Table 6). However, for the 18S dataset, the flotation method seems to be more effective for extraction of nematodes than the siphoning method in the Gullmarn Fjord samples (Fig. 4A  and 4B). Moreover, the beta diversity PCoA results highlight the fact that sample composition is influenced by the choice of extraction method for both COI and 18S datasets (p-value = 0.001) leading to four different clusters (Fig. 6and 6B, Table 6). For the COI dataset, in addition to extraction method as a factor of divergence, choice of primer (COI Leray or COI Lobo) also influences the grouping of the samples (p-value = 0.003 excluding unassigned OTUs and 0.001 including unassigned OTUs), in particular for the Hållö Island samples (Fig. 6C). Moreover, the COI Lobo primer seems to uncover a higher diversity of taxa than the COI Leray primer Fig. 5E) even if the results are considered to be non significant (p-value = 0.585 excluding unassigned OTUs and 0.111 including unassigned OTUs) (

Molecular identifications to species level
Using a sequence similarity search at 97% similarity allowed us to identify 213 COI OTUs and 243 18S OTUs to species level (

Invasive and alien species detected in the samples
Five alien species were detected in in the sample, of which two are considered invasive (in bold; Table 9), and the other three are on alert lists. The two invasive species (Acartia tonsa, a copepod, and Alexandrium ostenfefeldii, a dinoflagellate) could easily be overlooked in routine monitoring programs. Species within the genus Acartia are difficult to distinguish (Jensen 2010) and the invasive species can be confused with other native species. Also A. ostenfeldii is easily misidentified as other Alexandrium species; detailed thecal plate observation is often necessary for proper identification (Balech 1995). This shows the potential of molecular techniques for monitoring invasive species, and points to problems using traditional identification techniques. Many invasive species arrive in an area as spores, larvae or juveniles -all life stages that may be easily overlooked and problematic to identify to species level. Target barcoding of environmental DNA (eDNA) shows a great promise for detecting species without the need of costly sampling schemes. This would also allow for more random sampling in an area, increasing the probability of actually finding a species even when they occur in low numbers.
there are xenacoelomorphs with high tolerance for hypoxia that are not captured by the siphoning method, and thus would not be found in the manually sorted samples, but could be detected by metabarcoding of unprocessed samples. It should be noted that the extraction method used on the Hållö samples does not rely on migration of specimens to the surface.  Table 10.

Gullmarn Fjord
Taxonomic composition and relative abundance (% of the total number of specimens) of Xenacoelomorpha species in Gullmarn Fjord and Hållö sites.

Comparison of metabarcoding versus morphology-based identification of Nematoda
Both study sites are characterized by rich and diverse nematode fauna. The Hållö site had a total of 107 species of nematodes, belonging to 86 genera (Holovachov et al. 2017). Of these, 88 species belonging to 73 genera were found in samples extracted by flotation with a MgCl2 solution, and 101 species belonging to 83 genera were found in samples extracted by flotation with fresh water. The Gullmarn fjord site had a total of 113 nematode species of nematodes, belonging to 77 genera (Holovachov et al. 2017). Of these, 81 species belonging to 62 genera were found in samples extracted by siphoning, and 102 species belonging to 70 genera were found in samples extracted by flotation with fresh water. A certain small number of nematode individuals in each sample were not identified to species/genus/family, either due to their developmental stage or quality of preservation.
The final list of nematode OTUs includes 139 18S sequences. Only two 18S OTUs were positively identified using QIIME to species level using 97% similarity threshold: Viscosia viscosa (TS6.SSU58722) and Chromadora nudicapitata (HF2.SSU192072), six more were assigned to reference sequences identified to genus level only (Suppl. material 1). Only 22 COI sequences were assigned to the phylum Nematoda, and none was identified to species level.
When comparing the results of morphology-based assessment of nematode diversity with metabarcoding using taxonomic assignments to the phylum level in this particular study (with 80% similarity threshold; Suppl. materials 2, 3), the detailed and extensive examination of samples and morphology-based species identification provided more comprehensive estimates of nematode diversity (107 species in Hållö and 113 species in Gullmarn Fjord) than metabarcoding using either one of the molecular markers, independently of the extraction technique or locality (Table 12). Moreover, COI barcodes were much harder to obtain for marine nematodes using either one of the primers (16 OTUs in Hållö and 9 OTUs in Gullmarn Fjord using Lobo primers; 17 OTUs in Hållö and 4 OTUs in Gullmarn Fjord using Leray primers), comparing to 18S (95 OTUs in Hållö and 78 Table 11. Total number of Xenacoelomorpha taxa or OTUs distinguished based on morphology (Table 10), 18S and COI from different sampling sites and extraction methods (placement of OTUs is based on 80% similarity threshold, Suppl. materials 2, 3) OTUs in Gullmarn Fjord site; Table 12). Due to the very limited reference databases available for marine nematodes, very few nematode OTUs can be identified to species or genus level, making it difficult to use metabarcoding data in ecological studies.