Biodiversity Data Journal : General research article
|
Corresponding author: P. Roxanne Kellar (rkellar@unomaha.edu)
Academic editor: Quentin Groom
Received: 02 Jun 2015 | Accepted: 10 Jul 2015 | Published: 17 Jul 2015
© 2016 Shelly K. Aust, Dakota L. Ahrendsen, P. Roxanne Kellar.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Aust S, Ahrendsen D, Kellar P (2015) Biodiversity assessment among two Nebraska prairies: a comparison between traditional and phylogenetic diversity indices. Biodiversity Data Journal 3: e5403. doi: 10.3897/BDJ.3.e5403
|
![]() |
Conservation of the evolutionary diversity among organisms should be included in the selection of priority regions for preservation of Earth’s biodiversity. Traditionally, biodiversity has been determined from an assessment of species richness (S), abundance, evenness, rarity, etc. of organisms but not from variation in species’ evolutionary histories. Phylogenetic diversity (PD) measures evolutionary differences between taxa in a community and is gaining acceptance as a biodiversity assessment tool. However, with the increase in the number of ways to calculate PD, end-users and decision-makers are left wondering how metrics compare and what data are needed to calculate various metrics.
In this study, we used massively parallel sequencing to generate over 65,000 DNA characters from three cellular compartments for over 60 species in the asterid clade of flowering plants. We estimated asterid phylogenies from character datasets of varying nucleotide quantities, and then assessed the effect of varying character datasets on resulting PD metric values. We also compared multiple PD metrics with traditional diversity indices (including S) among two endangered grassland prairies in Nebraska (U.S.A.). Our results revealed that PD metrics varied based on the quantity of genes used to infer the phylogenies; therefore, when comparing PD metrics between sites, it is vital to use comparable datasets. Additionally, various PD metrics and traditional diversity indices characterize biodiversity differently and should be chosen depending on the research question. Our study provides empirical results that reveal the value of measuring PD when considering sites for conservation, and it highlights the usefulness of using PD metrics in combination with other diversity indices when studying community assembly and ecosystem functioning. Ours is just one example of the types of investigations that need to be conducted across the tree of life and across varying ecosystems in order to build a database of phylogenetic diversity assessments that lead to a pool of results upon which a guide through the plethora of PD metrics may be prepared for use by ecologists and conservation planners.
asterids, community ecology, conservation, grasslands, next-generation sequencing, phylogenetic diversity
Preservation of Earth’s biodiversity is a priority as ecosystems face changes due to anthropogenic actions, which initiate rapid adaptive responses from organisms, affect genetic variation (often depleting it) in extant species, and result in the establishment of new communities (
Since
Biodiversity assessment should start with both knowledge of the species present and their evolutionary histories (
Some of the most common PD metrics are shown in
Summary of definitions, descriptions, software, and functions to calculate 17 phylogenetic diversity metrics, four traditional diversity indices, and the K statistic for the functional trait: specific leaf area.
Metric | Definition | Description | Softwarea | Citation |
PDFaith | original PD metric | the sum of branch lengths between species in a tree | pd |
|
PDSES | standardized effect size of PDFaith | standardized effect size of PD vs. a null community | ses.pd |
|
MPDb | mean pairwise distance | mean phylogenetic distance connecting species | mpd |
|
MNTDb | mean nearest taxon distance | mean phylogenetic distance for each species to its closest relative | mntd |
|
NRIb | net relatedness index | MPD vs. a null community | ses.mpd |
|
NTIb | nearest taxon index | MNTD vs. a null community | ses.mntd |
|
SPDb | sum of phylogenetic distances | sum of phylogenetic distances between pairs of species in a community | mpd* number of species pairs |
|
PSV | phylogenetic species variability | related to NRI, but is independent of S | psv |
|
PSE | phylogenetic species evenness | variation of PSV but incorporates species abundance | pse |
|
PSC | phylogenetic species clustering | related to NTI, quantifies branch tip clustering of species in a tree | psc |
|
PSR | phylogenetic species richness | related to S and incorporates phylogenetic relatedness | psr |
|
IST | local phylogenetic similarity excess | local phylogenetic similarity excess; average among-community diversity/total diversity across all samples | raoD |
|
K | measure of phylogenetic signal | a measure of the likeliness of phylogenetically related species to resemble each other | Kcalc |
|
S | species richness | total number of species in a sampled site | - |
|
ENS | effective number of species | exponential of the Shannon-Weiner index; the number of species randomly generated for each community in order to equal the entropy for that community | EstimateS |
|
S J | Jaccard index; measure of similarity between sites | compares the number of shared species to the total number of species in the combined sites | EstimateS |
|
S S | Sørensen index; measure of similarity between sites | applies weight to species common to each site over those found at only one site, and compares the number of shared species to the total number of species in the combined sites | EstimateS |
|
a - Metrics were calculated either in R (Version 3.0.1; b - Metrics with incidence and abundance-weighted versions |
||||
Until recently, most studies in which PD was examined used simulated data or only one to a few gene sequences downloaded from GenBank (e.g.
Individuals in a community interact based on the traits they possess. Traits can be traced through evolutionary history; therefore, phylogenies can give an indication of how members of a community assemble (
Functional diversity (FD) evaluations highlight complementary or differing patterns of community assembly that influence biodiversity and community function. Phylogenetic diversity and FD assessments are good indicators of the effects of biodiversity on ecosystem function (
In this study, we utilized massively parallel (also known as next-generation) sequencing to generate DNA character data from three cellular compartments (plastids, mitochondria, and nuclei) in plants. These data were used to estimate both robust, total evidence phylogenies with high bootstrap support and single- and dual-gene phylogenies in order to test the effect of data quantity on PD metrics. With these phylogenies, we calculated and compared 17 PD metrics, four traditional diversity indices, and the phylogenetic signal of one plant functional trait among plants in two Nebraska prairies. Our study aimed to answer the following questions: 1) How do datasets of varying character quantities affect PD metrics? and 2) What do the various metrics indicate about biodiversity at these sites?
Study sites – Our research focused on two endangered prairies in Nebraska, U.S.A.: 1) The Nature Conservancy’s Niobrara Valley Preserve (NVP; 23,000 hectares) located in north-central Nebraska (42o47' N, 100o02' W) and 2) Nine-Mile Prairie (NMP; 93 hectares) located northwest of Lincoln, Nebraska (40o52' N, 96o48' W). These sites were selected because remnant prairies have decreased in total geographic area more than any other ecosystem since the early 1800s (
Taxon sampling – Ideally, a biodiversity study should assess all organisms in a community; however, this is not practical due to time and financial limitations. Grasses make up most of the biomass in prairies, but flowering forbs (i.e. herbaceous non-grasses) make up the greatest diversity (
Field work was conducted in 2012 and 2013. Three samples of each asterid or outgroup species found at the sites were collected for herbarium vouchers, and fresh leaf material was collected and dried over silica for DNA extractions. Rare species and small populations (i.e. less than 20 individuals) were not collected in order to protect the species’ populations. Using a field sub-sampling of random 1m x 2m plots, we estimated the total S at each site with a species accumulation curve. We located plots at all points at which a ‘new’ species occurred plus multiple plots selected at random to ensure full coverage of the sites. We recorded plot locations on a Trimble GPS and mapped them in Arc/GIS. Maximum S was identified when the accumulation of additional asterid and outgroup species ceased to increase regardless of the number of additional plots examined. For each plot, we recorded the percent cover (abundance) of each species. All species were identified by morphological characters using The Flora of Nebraska (
DNA extraction and sequencing – Total genomic DNA including plastid (cp), mitochondrial (mt), and nuclear (nr) DNA was extracted using the IBI Genomic DNA Mini Kit (IBI Scientific, Peosta, IA, USA) until 12 μg of DNA, measured with a NanoDrop (Thermo Scientific), was obtained. Samples were sent to the University of Nebraska Medical Center or University of Missouri DNA Core for library preparation and Illumina sequencing. Samples were run on Illumina Hi-Seq at 14 samples per lane, paired-end, or 12 samples per lane, single-pass runs. In addition to several new species collected and sequenced for this study, we included 76 cp genes from 23 Asteraceae species published in
Illumina sequence reads were mapped to a reference genome (from the same family or a close relative) downloaded from GenBank (
Phylogenetic analyses – Phylogenetic analyses were conducted with both maximum parsimony (using PAUP* 4.0b10;
Metric calculations – To compare S between sites, we calculated the effective number of species (ENS) by taking the exponential of the Shannon-Wiener index (a non-linear index), which accounts for the entropy in a set of samples (
All PD metrics were calculated in R (Version 3.0.1;
To provide one example of how assessment of functional diversity may be incorporated into this type of study, we measured the phylogenetic signal of specific leaf area (SLA; leaf area:dry mass). SLA indicates the amount of matter a leaf invests in order to produce energy via photosynthesis (
To quantify the phylogenetic signal of this functional trait, SLA was mapped on the phylogeny by assigning the SLA value to the corresponding tree tip (the corresponding extant species). The K statistic (
The data underpinning the analysis reported in this paper are deposited in the Dryad Data Repository at https://doi.org/10.5061/dryad.qj177.
DNA extractions for 40 collections (see
Phylogenetic trees were estimated: 1) rbcL only (
Alignment lengths and tree statistics for all datasets.
Tree/dataset | alignment length (bp) | Pairwise % identity | Tree/dataset length | # Parsimony informative characters | CI | RI |
matK | 1737 | 83.9% | 3605 | 861 | 0.4638 | 0.7697 |
rbcL | 1464 | 93.2% | 1657 | 379 | 0.3744 | 0.7323 |
rbcL + matK | 3192 | 87.9% | 5265 | 1234 | 0.4325 | 0.7546 |
cpmtnuca | 65480 | 92.1% | 70517 | 17823 | 0.4539 | 0.7718 |
cpmtnuca: tree inferred from concatenation of 76 plastid genes, six mitochondrial genes, and three nuclear repeat regions Notes: Consistency index (CI) and retention index (RI) exclude uninformative characters. bp = nucleotide base-pairs; alignments were uploaded to the Dryad Digital Repository |
Maximum likelihood (ML) tree (-ln L=46268.63) inferred from the concatenation of 76 plastid, six mitochondrial, and three nuclear ribosomal repeat regions (cpmtnuc;
Four traditional diversity indices and 17 PD metrics were calculated using the cpmtnuc tree (
Seventeen PD metrics calculated from the phylogeny inferred from 76 plastid genes, six mitochondrial genes, and three nuclear repeat regions (cpmtnuc), four traditional diversity indices, and the K statistic for one functional trait. Metrics were calculated for Nine-Mile Prairie (NMP), Niobrara Valley Preserve (NVP) and the three sites within NVP: North (N), South (So), and West (W).
Metric | NMP | South | West | North | NVP | ||||
PDFaith | 0.535 | 0.625 | 0.914 | 0.964 | 1.280 | ||||
PDSES | -1.317 | -0.515 | 0.053 | -0.554 | 0.621 | ||||
MPD | 0.097 | 0.089 | 0.102 | 0.097 | 0.104 | ||||
MPDaw | 0.077* | 0.101 | 0.083 | 0.094 | 0.097* | ||||
MNTD | 0.022* | 0.035 | 0.029 | 0.025 | 0.025 | ||||
MNTDaw | 0.017* | 0.055 | 0.021 | 0.030 | 0.030 | ||||
NRI | 0.592 | 1.264 | -0.036 | 0.876 | -0.610 | ||||
NRIaw | 0.863 | -1.357 | -0.285 | -0.205 | -0.534 | ||||
NTI | 2.039* | 0.401 | 0.596 | 1.091 | 0.295 | ||||
NTIaw | 1.559 | -0.799 | 0.565 | -0.117 | -0.382 | ||||
SPD | 22.322 | 20.468 | 57.376 | 75.523 | 154.517 | ||||
SPDaw | 17.776 | 23.267 | 46.443 | 73.007 | 143.874 | ||||
PSV | 0.441 | 0.358 | 0.416 | 0.396 | 0.422 | ||||
PSE | 0.356 | 0.383 | 0.329 | 0.372 | 0.375 | ||||
PSC | 0.888 | 0.858* | 0.879 | 0.893 | 0.897 | ||||
PSR | 9.706 | 7.868 | 14.154 | 15.829 | 23.195 | ||||
IST | 9M:NVP=0.009 | N:S=0.008 | N:W=0.005 | S:W=0.007 | |||||
9M:N=0.011 | 9M:W=0.013 | 9M:S=0.020 | |||||||
K | 0.154 | 1.171* | 0.058 | 0.028 | 0.041 | ||||
S | 22 | 22 | 34 | 40 | 55 | ||||
ENS | 31.6 | 56.9 | 58.4 | 47.3 | 53.3 | ||||
SJ | 9M:NVP=0.172 | N:So=0.326 | N:W=0.431 | So:W=0.436 | |||||
9M:N=0.200 | 9M:W=0.170 | 9M:So=0.075 | |||||||
SS | 9M:NVP=0.293 | N:So=0.492 | N:W=0.603 | So:W=0.607 | |||||
9M:N=0.333 | 9M:W=0.291 | 9M:So=0.140 | |||||||
Notes: “*” indicates statistical significance (p< 0.05) |
We conducted regression analyses (not shown) to estimate the relationships between S and several PD metrics. A strong positive correlation was seen between S and PDFaith (r2 = 0.974), a moderate positive correlation between S and MPD (r2 = 0.562), a weak negative correlation between S and MNTD (r2 = – 0.110), and a strong positive correlation between S and SPD (r2 = 0.975). In addition, comparisons between S and PSV (r2 = 0.058) and between S and PSE (r2 = 0.016) revealed no correlation, S and PSC (r2 = 0.4885) showed a weak correlation, and S and PSR (r2 = 0.984) showed a strong positive correlation.
To address the question of how datasets containing different amounts of data affect PD metrics, the three most common metrics (PDFaith, MPD, and MNTD) were compared (
Comparison of three PD metrics (PDFaith, MPD, and MNTD) calculated from varying datasets: rbcL, matK, rbcL + matK, and cpmtnuc for five prairie communities.
Notes: cpmtnuc = concatenation of 76 plastid genes, six mitochondrial genes, and three nuclear repeat regions;
NMP = Nine-Mile Prairie, NVP = Niobrara Valley Preserve, and North, South, and West represent the three sites within NVP
The phylogenetic structure of each community can be revealed by several of the PD metrics (PDSES, NRI, NRIaw, NTI, NTIaw). However, most of the metric values in this study were not statistically significant, and in these cases, the results suggest random assembly. Only one value was statistically significant (NTI for NMP) indicating the species were phylogenetically clustered at this site.
Results of the non-parametric rank-based comparison (ranks not shown) revealed that NMP tended to rank lower in diversity than NVP across the metrics (U1 = 6.84; P = 0.009). In addition, the South community tended to rank lower in diversity than the North or West communities (F2 = 2.03; P = 0.362), although this result was not statistically significant.
SLA was calculated for each species, and average values ranged from 17.5 to 773.9 cm2/g (
Conservation biologists, community ecologists, and other researchers are currently exploring new ways to compare and contrast biodiversity between communities and ecosystems. With the growing popularity of massively parallel DNA sequencing and the ease of estimating or availability of existing phylogenies, these researchers are exploring phylogenetic diversity metrics. However, with the plethora of PD metrics now available, researchers are seeking advice as to which PD metrics should or may be used in various situations (
How do datasets of varying character quantities affect PD metrics? – The three most common PD metrics (PDFaith, MPD, and MNTD) were calculated based on four datasets varying in DNA character (nucleotide) quantity (
We cannot compare the absolute values of these PD metrics from varying datasets because of the differences in how the branch lengths are measured; therefore, to determine if they are characterizing biodiversity differently, we analyzed the change in each metric across the species gradient at the different sites (see regression values in “Results”). The correlations were the same despite the difference in character data used to calculate the PD metrics; however, some correlations were as expected from simulations (
These results suggest that a multi-gene phylogeny may not be necessary to obtain relevant PD metric results; however, one must proceed with caution. First, our results highlight the importance of using comparable datasets (i.e. the same character matrix) when inferring phylogenies to calculate and compare PD metrics between sites because of the incorporation of branch lengths. Supertrees constructed from smaller phylogenies that were likely estimated from different datasets cannot be used to calculate PD metrics. Second, this is the first study to address this question with a large clade of flowering plants, but the sample size is relatively small. Additional studies are needed that make these same calculations with larger datasets across varying communities/ecosystems.
What do the various metrics indicate about biodiversity at these sites? – Scientists from multiple fields of study seek comprehensive biodiversity assessment tools and empirical studies that reveal proper application of the multitude of metrics. Phylogenetic, functional, and species diversity are the main components contributing to biodiversity (
Global conservation organizations select priority regions for preservation based on several factors, but they have all considered S as a basic index for characterizing biodiversity (e.g.
SJ and SS measure site similarities and do not include phylogeny, whereas IST measures site differences and incorporates phylogenetic information; therefore, SJ and SS are expected to be positively correlated, and SJ and IST and SS and IST are expected to be negatively correlated. Our data matched these expectations, providing multiple lines of support for the site comparison metrics. Beyond the traditional diversity measures, conservation organizations may want to select priority regions based on evolutionary history of species but may not have the resources to assemble phylogenetic information. Therefore, it is important to know if and when S can be used as a predictor of phylogenetic diversity.
It may seem obvious that a tree with more species will have more branches and a high probability of having greater PDFaith (
Our empirical data resulted in mixed correlations between S, SPD, and the
Mean pairwise distance (MPD) averages the evolutionary differences between all pairwise species in the tree and reveals deep species relatedness. Higher values indicate more species with above-average branch lengths. Mean nearest taxon distance (MNTD) averages the evolutionary distance between each species and its nearest neighbor. Higher values indicate that some taxa have branches that are much longer than average. Net relatedness index (NRI) and nearest taxon index (NTI) are equivalent to MPD and MNTD, respectively, but they compare MPD and MNTD values to null communities, allowing for assessment of statistical significance. As mentioned earlier, in computer simulations, MPD showed no correlation with S and MNTD showed a negative correlation with S. In our data, the relationship between S and MPD was moderately positive, but there was only a weak negative correlation between S and MNTD. Again, this discrepancy may indicate a non-random change in phylogenetic diversity over the S gradient. Communities with high MPD and NRI values indicate species assemblages with ancient speciation events and possibly greater potential for evolutionary change that will allow populations to persist in changing environments. Communities with high MNTD and NTI values indicate species assemblages with more recent speciation events, which may indicate adaptive radiations that have resulted in endemic species, a site characteristic valued by conservation planners.
Abundance-weighted (AW) metrics can add value to biodiversity comparisons because they give an indication of the impact of evolutionary history on community assembly. When AW metric values are greater than the incidence metric values relative to a comparable community, this is an indication there are some species that may be dominant at a site. From the correlations reported in our results, the relationships between the species incidence metrics and the AW metrics confound diversity comparisons because the relative values at each site are not consistent such that sites with high abundance of some species may be identified. Our results may not lead to strong conclusions because most of the values are not statistically significant; however, this project represents the possibilities for calculating multiple PD metrics once a phylogeny is estimated. The value in calculating and comparing all of these metrics is to identify when empirical results do not match predictions. These situations will draw attention to notable discrepancies such as the PD metric variations between South and NMP (above), which have equivalent S values in our study or the correlations that do not match computer simulations. Additionally, comparing multiple metrics can provide supporting evidence about community assembly.
PDSES, NRI, and NTI (and their AW counterparts) reveal patterns of phylogenetic structure or community assembly (i.e. phylogenetic clustering or phylogenetic overdispersion/evenness) when values are statistically significant. Otherwise they indicate random assembly. All three metrics should result in the same characterization about species relatedness (
Calculating the phylogenetic signal of functional plant traits can also give an indication about a community through assembly of the traits in question. To test this component of biodiversity at our sites, we mapped specific leaf area (SLA) onto the phylogenetic tree and calculated the K statistic. Only one value was statistically different from Brownian motion – the K statistic for the South community was greater than one, indicating this trait is conserved across the tree and the species resemble each other more than expected by chance (low diversity). In the other communities, the values were not statistically significant and, therefore, indicate random trait assembly. Ideally for a study of trait evolution and indication of functional diversity at a site, more than one functional trait should be included and the relationship between the K statistic, S, and PD should be analyzed.
Since each metric characterizes biodiversity differently, it is important to choose the correct metric for the application as described above. No single metric considers all aspects of diversity and should be chosen based on the question of interest (
In one of the few empirical studies ever conducted that calculated the 17 most common PD metrics from massively parallel sequencing data, our results provide a baseline of data for future comparisons of biodiversity metrics. From this study, we drew five primary conclusions: 1) traditional indices do a fairly good job of quantifying overall diversity at a site, but to characterize the source of biodiversity such as ancient vs. recent speciation events, phylogenetic relationships must be incorporated; 2) S may be a good indicator for some PD metrics but not for others; 3) multiple diversity indices (both traditional and phylogenetic) should be calculated for a comprehensive biodiversity analysis; 4) inclusion of large species numbers (i.e. > 80 species) may be needed to obtain statistically significant results and to detect phylogenetic diversity beyond S; and 5) comparisons of PD metrics must be based on phylogenies estimated from equivalent character datasets. Future investigations are needed that 1) include larger numbers of taxa; 2) compare metrics between differing geographical sites; 3) include multiple traits for a comprehensive analysis of FD; and 4) compare PD metrics calculated from phylogenies estimated from various gene datasets (from three to many genes) to determine the effective number of genes necessary to calculate informative PD metrics. Our results, as well as future results, will contribute to the growing database of empirical PD metric data that will aid community ecologists and conservation biologists in future investigations of biodiversity and selection of priority regions for preservation.
The authors wish to thank a very thorough BDJ peer-reviewer for many valuable recommendations, M.W. Cadotte (UT-Scarborough) for informative conversations, D. Sutherland (UNO) for plant identification assistance, C. Kellar and A. Jones for field assistance, and A. Swift (UNO) for guidance with statistical analyses. We also thank the MU Core Sequencing facility, the UNMC Core Sequencing facility, the managing institutions (The Nature Conservancy and University of Nebraska Foundation) for access to the two grassland sites, and the following granting institutions: NSF Nebraska EPSCoR First Award (Prime Award: EPS1004094; Subaward: 95-3101-0040-217) and the NASA Nebraska Space Grant. SKA also thanks the following for conference travel and research grants: UNO-GRACA, ASPT, MOBOT Delzie Demaree Travel Award, the UNO Graduate Department, and the UNO Biology Department.
All species included in the study, herbarium voucher numbers, and average specific leaf area (SLA) calculated for each species
GenBank accession numbers for each gene/region by organelle.
Note: "-" indicates a missing gene
Maximum likelihood (ML) tree (-ln L=10645.92) inferred from rbcL only (Suppl. material 7); matching 1 of 68 most parsimonious (MP) trees except were dagger (†) is shown. Tree includes 62 asterid species and 3 outgroups (Comandra umbellata, Silene vulgaris, and Silene antirrhina). Numbers above branches indicate branch lengths used to calculate various Phylogenetic Diversity (PD) metrics. Numbers below the branches indicate MP/ML bootstrap support values resulting from 1000 replicates each. Low branch support (<50) is indicated by an asterisk (*). Missing bootstrap values are denoted by a dash (-).
Maximum likelihood (ML) tree (-ln L=19796.78) inferred from matK only (Suppl. material 8); matching 1 of 52 most parsimonious (MP) trees except were dagger (†) is shown. Tree includes 62 asterid species and 3 outgroups (Comandra umbellata, Silene vulgaris, and Silene antirrhina). Numbers above branches indicate branch lengths used to calculate various Phylogenetic Diversity (PD) metrics. Numbers below the branches indicate MP/ML bootstrap support values resulting from 1000 replicates each. Low branch support (<50) is indicated by an asterisk (*). Missing bootstrap values are denoted by a dash (-).
Maximum likelihood (ML) tree (-ln L=30809.97) inferred from the concatenation of rbcL + matK (Suppl. material 9); matching one most parsimonious (MP) tree except were dagger (†) is shown. Tree includes 62 asterid species and 3 outgroups (Comandra umbellata, Silene vulgaris, and Silene antirrhina). Numbers above branches indicate branch lengths used to calculate various Phylogenetic Diversity (PD) metrics. Numbers below the branches indicate MP/ML bootstrap support values resulting from 1000 replicates each. Low branch support (<50) is indicated by an asterisk (*). Missing bootstrap values are denoted by a dash (-).
Nexus alignment file.
Nexus alignment file.
Nexus alignment file.
Nexus alignment file.