Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Kathleen Rosmarie Stoof-Leichsenring (kathleen.stoof-leichsenring@awi.de)
Academic editor: Quentin Groom
Received: 30 Jul 2020 | Accepted: 26 Sep 2020 | Published: 14 Dec 2020
© 2020 Kathleen Stoof-Leichsenring, Sisi Liu, Weihan Jia, Kai Li, Luidmila Pestryakova, Steffen Mischke, Xianyong Cao, Xingqi Liu, Jian Ni, Stefan Neuhaus, Ulrike Herzschuh
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Stoof-Leichsenring KR, Liu S, Jia W, Li K, Pestryakova LA, Mischke S, Cao X, Liu X, Ni J, Neuhaus S, Herzschuh U (2020) Plant diversity in sedimentary DNA obtained from high-latitude (Siberia) and high-elevation lakes (China). Biodiversity Data Journal 8: e57089. https://doi.org/10.3897/BDJ.8.e57089
|
|
Plant diversity in the Arctic and at high altitudes strongly depends on and rebounds to climatic and environmental variability and is nowadays tremendously impacted by recent climate warming. Therefore, past changes in plant diversity in the high Arctic and high-altitude regions are used to infer climatic and environmental changes through time and allow future predictions. Sedimentary DNA (sedDNA) is an established proxy for the detection of local plant diversity in lake sediments, but still relationships between environmental conditions and preservation of the plant sedDNA proxy are far from being fully understood. Studying modern relationships between environmental conditions and plant sedDNA will improve our understanding under which conditions sedDNA is well-preserved helping to a.) evaluate suitable localities for sedDNA approaches, b.) provide analogues for preservation conditions and c.) conduct reconstruction of plant diversity and climate change. This study investigates modern plant diversity applying a plant-specific metabarcoding approach on sedimentary DNA of surface sediment samples from 262 lake localities covering a large geographical, climatic and ecological gradient. Latitude ranges between 25°N and 73°N and longitude between 81°E and 161°E, including lowland lakes and elevated lakes up to 5168 m a.s.l. Further, our sampling localities cover a climatic gradient ranging in mean annual temperature between -15°C and +18°C and in mean annual precipitation between 36 and 935 mm. The localities in Siberia span over a large vegetational gradient including tundra, open woodland and boreal forest. Lake localities in China include alpine meadow, shrub, forest and steppe and also cultivated areas. The assessment of plant diversity in the underlying dataset was conducted by a specific plant metabarcoding approach.
We provide a large dataset of genetic plant diversity retrieved from surface sedimentary DNA from lakes in Siberia and China spanning over a large environmental gradient. Our dataset encompasses sedDNA sequence data of 259 surface lake sediments and three soil samples originating from Siberian and Chinese lakes. We used the established chloroplastidal P6 loop trnL marker for plant diversity assessment. The merged, filtered and assigned dataset includes 15,692,944 read counts resulting in 623 unique plant DNA sequence types which have a 100% match to either the EMBL or to the specific Arctic plant reference database. The underlying dataset includes a taxonomic list of identified plants and results from PCR replicates, as well as extraction blanks (BLANKs) and PCR negative controls (NTCs), which were run along with the investigated lake samples. This collection of plant metabarcoding data from modern lake sediments is still ongoing and additional data will be released in the future.
Arctic, chloroplast DNA, lakes, metabarcoding, plant diversity, sedimentary DNA, Tibet Plateau, trnL P6 loop, vegetation
Arctic and high-elevation ecosystems are very sensitive to natural and anthropogenically-induced climate variability. Anthropogenic warming and changes in land-use have been considered to shift vegetation composition and plant richness in these areas during the last centuries and decades (
Plant diversity from sedimentary DNA in Siberian and Chinese lakes
Lakes are located in the Siberian Arctic and in low- to high-elevation lakes from Northern China and from the Tibetan Plateau. Lake sites include large lakes, which were formed during past glacial periods and smaller lakes formed by thermokarst.
The lake sites in Siberia were accessed during field trips conducted by the Alfred Wegener Institute in the years 2005 to 2016. The lake sites in China were visited from 2003 to 2018. This data-set has been established to correlate modern genetic plant diversity with modern vegetation mappings and climate and environmental data. The modern sedDNA data will be used to a.) evaluate suitable localities for sedDNA approaches, b.) provide analogues for preservation conditions and c.) conduct reconstruction of plant diversity and climate change mainly across glacial/interglacial phases in the Late Pleistocene–Holocene and during recent environmental change in the Anthropocene.
The CAS Strategic Priority Research Program supported the sample collection on the Tibetan Plateau in 2018 (CAS, Grant No. XDA20090000). Jian Ni, Xianyong Cao and Kai Li were also supported by the Grant No. XDA20090000 and the China Scholarship Council (CSC). Further, fieldwork on the Tibetan Plateau was supported by the Research program (STEP) of the Chinese Academy of Science (CAS; Grant No. 2019QZKK0202), the Strategic Priority Research Program of Chinese Academy of Sciences, Pan-Third Pole Environment Study for a Green Silk Road (Pan-TPE) (XDA20040000) and the National Natural Science Foundation of China (41877459). The National Natural Science Foundation of China (NSFC) and the German Research Foundation (DFG; Grant No. 41861134030) supported the project as well. The expedition to Sakha in 2007 (07-SA) was funded by the grant from the Ministry of Nature Protection of the Republic of Sakha (Yakutia) “Bioindication assessment of the quality of drinking water of the Anabar River" (No. 8.27, 2005-2007). The Kolyma expedition 2008 (08-KO) was partly supported by the contractual theme of the Ministry of Nature Protection of the Republic of Sakha (Yakutia). The Tiksi expedition in 2009 (09-Tik) was supported by the Russian-German Biological Monitoring (BioM) research network in the terrestrial Arctic, funded by the German Ministry of Education and Research (BMBF) and supported by the German Research Foundation (DFG). The expeditions to Chatanga in 2011 and 2013 (11-CH, 13-TY) were co-financed by a Project of the Ministry of Education of the Russian Federation “NEFU Development Program: Activity 2.8 Biomonitoring of the tundra ecosystems of the North-East of Russia under the conditions of global climate change and intensification of the anthropogenic process (monitoring, ecology, paleogeography, model and environmental management technologies). The expeditions to the Omoloy region in 2014 (14-OM) and Chukotka in 2016 (16-KP) were funded by the Ministry for Education and Science of the Russian Federation (project 5.184.2014/K).
Sampling localities comprise lakes from Northern (which include expeditions to 07-SA, 09-Tik, 11-CH,13-TY, 14-OM), Eastern (08-KO, 16-KP) and Central Siberia (05-Yak), Northern and Central China and Tibet (Fig.
Between the years 2003 and 2018, several expeditions were undertaken in which surface samples from 262 different localities in Siberia and China were sampled. Samples were taken with a bottom sampler from a boat mostly in the centre of the lakes (259 samples) or from soil on a lake’s shoreline (three samples).
The first centimetre of the surface sediment (259 samples) and shoreline soil (three samples) was carefully sampled by using gloves and single-use plastic spoons. Samples were transferred into sterile Whirl-Pak® or sterilised Nalgene tubes and were kept dark and cool (+4°C) until further treatment in the laboratories for environmental and ancient DNA at Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research.
The lake localities cover a large geographical, climatic and ecological gradient, including elevated lakes from 0 up to 5168 m a.s.l. Latitude ranges between 25°N and 73°N and longitude between 81°E and 161°E. The climatic gradient ranges in mean annual temperature between -15°C and +18°C and in mean annual precipitation between 36 and 935 mm. The localities in Siberia span a large vegetational gradient including tundra, open woodland and boreal forest. Lake localities in China include alpine meadow, shrub, forest and steppe and also cultivated areas.
25° and 73° Latitude; 81° and 161° Longitude.
The retrieved DNA metabarcoding data provide 623 unique plant sequences which mainly include terrestrial and aquatic vascular plants and a few mosses. Plant DNA sequences are identified to different taxonomic levels. About 78% of sequences types are assigned to species level, 13% to genus-level and 9% to higher taxonomic levels (sub-family, family or order).
Sampling of lake surface-sediments was conducted in the years 2003–2018.
Compilation of environmental data for the 262 investigated localities, which include additional intra-lake localities taken within three large lakes, namely: 16-KP-01-L02 (nine samples), 16-KP-03-L10 (five samples), 16-KP-04-L19 (four samples) (Suppl. material
Column label | Column description |
---|---|
No. | Running number of items in the table |
Sample name | Name of the lake locality |
Latitude (N) | Latitude in decimal degrees (°N) |
Longitude (E) | Longitude in decimal degrees (°E) |
Elevation (m) | Elevation of lake locality in m a.s.l. |
Sample type | Type of material sampled: "Lake" indicates surface sediment from within the lake, "Soil" indicates soil sampled from lake's shoreline |
Geographic region | Geographic region of the lake locality |
Water depth (m) | Lake water depth at sampling site |
pH | pH values of the lake locality |
Water Conductivity (µS/cm) | Water conductivity of the lake locality |
Annual mean precipitation (mm) | Annual mean precipitation of the lake locality |
Annual mean temperature (℃) | Annual mean temperature of the lake locality |
July mean temperature (℃) | July mean temperature of the lake locality |
January mean temperature (℃) | January mean temperature of the lake locality |
Vegetation type | Vegetation type in the lake catchment |
Dominant_Taxon1 | Dominant plant taxon in the lake's catchment |
Taxon2 | Second dominant plant taxon in the lake's catchment |
Taxon3 | Third dominant plant taxon in the lake's catchment |
Taxon4 | Fourth dominant plant taxon in the lake's catchment |
Taxon5 | Fifth dominant plant taxon in the lake's catchment |
Taxon6 | Sixth dominant plant taxon in the lake's catchment |
Taxon7 | Seventh dominant plant taxon in the lake's catchment |
Taxon8 | Eighth dominant plant taxon in the lake's catchment |
Taxon9 | Ninth dominant plant taxon in the lake's catchment |
Taxon10 | Tenth dominant plant taxon in the lake's catchment |
Taxa list of identified plant sequences with either a 100% match to the embl138 or arctborbryo taxonomic database (Suppl. material
Column label | Column description |
---|---|
No. | Running number of items in the table |
ID | Unique identifier for a cluster on the sequencing flow cell |
best_identity_arctborbryo | Best identity with the arctborbryo database |
best_identity_embl138 | Best identity with the embl138 database |
scientific_name_by_arctborbryo | Best taxa name within the arctborbryo database |
scientific_name_by_embl138 | Best taxa name within the embl138 database |
DNA sequence | Nucleotide sequence of the DNA sequence type detected |
best_match_in_arctborbryo | Accession number of the best reference entry in the arctborbryo database |
best_match_in_ embl138 | Accession number of the best reference entry in the embl138 database |
count | Total count of sequence type occurring in the filtered DNA sequencing data of all four sequencing runs |
occurrence_in_PCR | Total number of occurrences in the PCRs |
Geographic coordinates, elevation, lake water depth, pH and electrical conductivity were measured with appropiate devices during the different field surveys. Geographic coordinates and elevation were measured with Garmin etrex devices. Lake water depth was measured with a hand echolot (Hondex PS-7) and pH and electrical conductivity were measured with a WTW multi-340i device. Annual mean temperature, mean temperature in July and January and mean annual precipitation were downloaded from WorldClim 2 (www.worldclim.org) and are based on the average climate data for the years 1970–2000 at a spatial resolution of 30 seconds (ca. 1 km2). The site-specific climate data was interpolated to the location area by using the R packages raster (
DNA extraction of surface sediment samples was carried out in the molecular genetic laboratories for environmental genetics at Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research. The genetic workflow using modern surface sediment samples includes DNA extraction, amplification, purification & pooling, DNA sequencing and bioinformatic analyses. For sediment DNA extractions, about 3–5 g of wet sediment and about 8-10 g of wet sediment for samples of 13-TY, were utilised. All DNA extractions were carried out by using the DNeasy PowerMax Soil Kit (Qiagen, Germany) and PowerMax Soil DNA Isolation kit (MoBio Laboratories, Inc., USA) following the manufacturer’s instructions with the following modifications: To each sediment sample in the Power Bead solution 2mg/ml proteinase K, 0.5ml dithiothreitol (DTT) and 1.2 ml C1 buffer were added, then vortexed for 10 min and incubated overnight in a rotating shaker at 56 °C. The final elution step was carried out with 1.6 ml C6 buffer. Each DNA extraction batch contained a maximum of nine samples and one extraction blank. Precautions were taken for all the experimental steps to avoid potential contaminations (
Summary of NGS sequencing runs.
1 Sample number is equivalent to number of pooled PCR products, which also include PCR replicates of samples and corresponding BLANKs and NTCs.
# Additional samples were pooled to the run, but do not belong to this project.
§ Sample replicates were pooled prior to final pooling of all PCR products.
Name sequencing run |
Device |
Output |
Cluster (PF) |
Sample locations |
Sample number1 |
HQD-2 |
HiSeq |
2x125bp |
51'487'592 |
Northern Siberia (13-TY) |
40#§ |
ALRK-3 |
NextSeq |
2x150bp |
26'800'628 |
North China, Xingijang, Tibet |
153# |
AGAK-5 |
NextSeq |
2x150bp |
87'831'867 |
Eastern Siberia (16-KP) |
161 |
ALRK-7 |
NextSeq |
2x150bp |
10'202'054 |
Northern Siberia (07-SA,09-Tik,11-CH, 14-OM), Eastern Siberia (08-KO), Central Siberia (Yak-05), North China, Xingijang, Tibet |
334# |
We analysed in total 262 lake samples, resulting in 688 PCRs, which divide into 553 sample PCRs and 135 PCRs of extraction blanks and NTCs. The analysis of the resulting sequence data and taxonomic assignments was done by using the OBITools package (
After merging raw paired-end reads and demultiplexing according to the internal barcode, two sample PCRs from lakes 16-KP-02-L08 and 16-KP-04-L22, as well as seven BLANKs and five NTCs yielded no read counts and were discarded from the dataset. After taxonomic assignment of the remaining 674 PCRs, we identified a total of 15,754,779 reads which had a 100% match to either the EMBL or arctborbryo database. For 99.6% of the reads (15,692,844 read counts), we identified 621 unique plant sequences types, while 0.39% (61,835 read counts) of the reads were assigned to non-plant taxa, including bacteria, algae and higher eukaryotic taxa (in total 72 unique sequence types). Further, we identified 340,346 reads (2.1% of the total dataset) in extraction blanks and NTCs, whereof 38.7% (23,975 reads) of reads in the BLANKs and NTCs were of non-plant origin. Amongst the samples, excluding BLANKs and NTCs, we found large differences in sample read counts which range between 1 and 718,279.
Compilation of environmental data for the 262 investigated localities, which include additional intra-lake localities taken within three large lakes namely: 16-KP-01-L02 (nine samples), 16-KP-03-L10 (five samples), 16-KP-04-L19 (four samples). The table includes information about the geographic coordinates, elevation, type of sample material, geographic region, water depth (at which samples were taken), pH, water conductivity, mean annual precipitation (MAP), mean annual temperature (MAT), July and January mean temperature, vegetation type and dominant plant taxon (‘dominant_Taxon1’: indicates the most frequent taxon, ‘Taxon2-10’: taxa listed in descending order by their distribution area in modern vegetation). If no dominant taxon is listed, the surrounding vegetation is too diverse to determine dominant taxon. ‘NA’ – data not available.
A taxa list of identified plant sequences with either a 100% match to the embl138 or arctborbryo taxonomic database. The table contains the ‘ID’ – unique identifier for a cluster on the sequencing flow cell, ‘best_identity_arctborbryo’ – best identity with the arctborbryo database, ‘best_identity_embl138’ – best identity with the embl138 database, ‘scientific_name_by_arctborbryo’ – best taxa name within the arctborbryo database, ‘scientific_name_by_embl138’ – best taxa name within the embl138 database, ‘DNA sequence’ – Nucleotide sequence of the DNA sequence type detected, ‘best_match_in_arctborbryo’ – accession number of the best reference entry in the arctborbryo database, ‘best_match_in_embl138’ – accession number of the best reference entry in the embl138 database, ‘count’ – total count of sequence type in the total sequencing project. The scientific name 'PACMAD clade' indicates the taxonomic assignment to a true group of grasses (Poaceae). 'root' indicates that the DNA sequence could not be assigned to a reference in the appropriate database. ‘occurrence_in_PCR’– total number of occurrences in the PCR samples.
A taxa list of identified plant sequences with either a 100% match to the embl138 or arctborbryo taxonomic database. The table contains the ‘ID’ – unique identifier for a cluster on the sequencing flow cell, ‘best_identity_arctborbryo’ – best identity with the arctborbryo database, ‘best_identity_embl138’ – best identity with the embl138 database, ‘scientific_name_by_arctborbryo’ – best taxa name within the arctborbryo database, ‘scientific_name_by_embl138’ – best taxa name within the embl138 database, ‘DNA sequence’ – Nucleotide sequence of the DNA sequence type detected, ‘best_match_in_arctborbryo’ – accession number of the best reference entry in the arctborbryo database, ‘best_match_in_embl138’ – accession number of the best reference entry in the embl138 database, ‘count’ – total count of sequence type in the total sequencing project. The scientific name 'PACMAD clade' indicates the taxonomic assignment to a true group of grasses (Poaceae). 'root' indicates that the DNA sequence could not be assigned to a reference in the appropriate database. Samples are sorted according to the four sequencing runs, which is indicated by the short name of the sequencing run at the beginning of the sample name. Sample batches (which include samples, BLANKs and NTCs) belonging together are numbered. Sample names that include A and B were sediment samples taken from the same bulk surface samples and share the same environmental data as given in Suppl. material 1.