Biodiversity Data Journal :
Research Article
|
Corresponding author: Steven B. Janssens (steven.janssens@plantentuinmeise.be)
Academic editor: Stephen Boatwright
Received: 03 Sep 2019 | Accepted: 12 Dec 2019 | Published: 21 Jan 2020
© 2020 Steven Janssens, Thomas L.P. Couvreur, Arne Mertens, Gilles Dauby, Leo-Paul Dagallier, Samuel Vanden Abeele, Filip Vandelook, Maurizio Mascarello, Hans Beeckman, Marc Sosef, Vincent Droissart, Michelle van der Bank, Olivier Maurin, William Hawthorne, Cicely Marshall, Maxime Réjou-Méchain, Denis Beina, Fidele Baya, Vincent Merckx, Brecht Verstraete, Olivier Hardy
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Janssens SB, Couvreur TL.P, Mertens A, Dauby G, Dagallier L-PMJ, Vanden Abeele S, Vandelook F, Mascarello M, Beeckman H, Sosef M, Droissart V, van der Bank M, Maurin O, Hawthorne W, Marshall C, Réjou-Méchain M, Beina D, Baya F, Merckx V, Verstraete B, Hardy O (2020) A large-scale species level dated angiosperm phylogeny for evolutionary and ecological analyses. Biodiversity Data Journal 8: e39677. https://doi.org/10.3897/BDJ.8.e39677
|
Phylogenies are a central and indispensable tool for evolutionary and ecological research. Even though most angiosperm families are well investigated from a phylogenetic point of view, there are far less possibilities to carry out large-scale meta-analyses at order level or higher. Here, we reconstructed a large-scale dated phylogeny including nearly 1/8th of all angiosperm species, based on two plastid barcoding genes, matK (incl. trnK) and rbcL. Novel sequences were generated for several species, while the rest of the data were mined from GenBank. The resulting tree was dated using 56 angiosperm fossils as calibration points. The resulting megaphylogeny is one of the largest dated phylogenetic tree of angiosperms yet, consisting of 36,101 sampled species, representing 8,399 genera, 426 families and all orders. This novel framework will be useful for investigating different broad scale research questions in ecological and evolutionary biology.
phylogeny, angiosperms, large-scale dating analyses, evolution, ecology
During the past two decades, awareness has grown that ecological and evolutionary studies benefit from incorporating phylogenetic information (
There is currently an ongoing quest to optimise the methodology for constructing large-scale mega-phylogenies that can be used for further ecological and evolutionary studies. This is done by either mining and analysing publicly available DNA sequences (
In 2009, the Consortium for the Barcode of Life working group (CBOL) advised sequencing of the two plastid markers matK (incl. trnK) and rbcL for identifying plant species, resulting in a massive amount of data available on GenBank. rbcL is a conservative locus with low level of variation across flowering plants and therefore useful for reconstructing higher level divergence. In contrast, matK contains rapidly evolving regions that are useful for studying interspecific divergence (
We extracted angiosperm sequence data of rbcL and matK (incl. trnK) from GenBank (15 February 2015) using the ‘NCBI Nucleotide extraction’ tool in Geneious v11.0 (Auckland, New Zealand). Five gymnosperm genera were chosen as outgroup (Suppl. material
A modified CTAB protocol was used for total genomic DNA isolation (
Amplification reactions of matK (incl. trnK) and rbcL were carried out with a 25 μl reaction mix containing 1 µl DNA, 2 x 1 µl oligonucleotide primer (100 ng/µl), 2.5 µl of 10 mM dNTPs, 2.5 µl Taq Buffer, 0.2 µl KAPA Taq DNA polymerase and 16.8 µl MilliQ water. Reactions commenced with a 3 minute heating at 95°C, followed by 30 cycles consisting of 95°C denaturation for 30 s, primer annealing for 60 s and extension at 72°C for 60 s. Reactions ended with a 3 minute incubation at 72°C. Annealing temperatures for matK (incl. trnK) and rbcL were set at 50°C and 55°C, respectively. Primers designed by Kim J. (unpublished) were used to sequence matK (incl. trnK), whereas rbcL primers were adopted from
We are aware that the publicly available database, GenBank, contains a large amount of erroneous data (
For sequence fragments that are protein-encoded, comparison of amino acid (AA) sequences, based on the associated triplet codons between taxa, was applied. As a result, taxa with a sudden shift in AA or frame shift were discarded from the dataset.
Alignment was carried out in multiple stages. Due to our large angiosperm-wide dataset, an initial alignment (automatically and manually) was conducted for each order included in the dataset. Subsequently, the different alignments were combined using the Profile alignment algorithm (Geneious v11.0, Auckland, New Zealand). The initial automatic alignment was conducted with MAFFT (
The best-fit nucleotide substitution model for both rbcL and matK (incl. trnK) was selected using jModelTest 2.1.4. (
Support values for the large angiosperm dataset were obtained via the rapid bootstrapping algorithm as implemented in RAxML 7.4.2 (
Evaluation of fossil calibration points was carried out following the specimen-based approach for assessing paleontological data by
List of fossils used as calibration points, including their oldest stratigraphic occurrence, minimum and maximum ages, the calibrated clades and used references. cr.=crown, st.=stem.
Clade |
Fossil |
Reference |
Period |
Locality/Formation/Group |
Min. age |
Max. age |
cr. / st. |
Ebenaceae |
Austrodiospyros cryptostoma Basinger et Christophel |
|
Late Eocene |
Anglesea formation (Victoria, Australia) |
37.8 |
54.62 |
cr. |
Apocynaceae |
Apocynophyllum helveticum Heer |
|
Middle Eocene |
Messel formation (Darmstadt, Germany) |
47.8 |
64.62 |
cr. |
Cornaceae |
Hironoia fusiformis Takahashi, Crane et Manchester |
|
Early Conacian |
Ashizawa formation, Futuba group (North-eastern Honshu, Japan) |
89.8 |
106.6 |
cr. |
Dipelta |
Dipelta europaea Reid et Chandler |
|
Late Eocene-Early Oligocene |
Bembridge Flora (UK) |
33.9 |
50.72 |
st. |
Oleaceae |
Fraxinus wilcoxiana (Berry) Call et Dilcher |
|
Middle Eocene |
Claiborne formation (Tennessee, USA) |
47.8 |
64.62 |
st. |
Diervilla |
Diervilla echinata Piel |
|
Oligocene |
Fraser River system (British Colombia, Canada) |
27.8 |
44.62 |
st. |
Solanaceae (Physalinae) |
Physalis infinemundi Wilf, Carvahlo, Gandolfo et Cuneo |
|
Early Eocene |
Laguna del Hunco (Chubut, Patagonia, Argentina) |
52.0 |
68.82 |
st. |
Valeriana |
Valeriana sp. |
|
Late Miocene |
Europe |
11.6 |
28.42 |
st. |
Emmenopterys |
Emmenopterys Oliv. |
|
Middle Eocene |
Middle Eocene Republic Flora (Washington, USA) |
47.8 |
64.62 |
st. |
Pelliciera |
Pelliciera rhizophorae Planch. et Triana |
|
Middle Eocene |
Gatuncillo formation (Panama) |
47.8 |
64.62 |
st. |
Araliaceae |
Acanthopanax gigantocarpus Knobloch et Mai |
|
Maastrichtian |
Eisleben formation (Germany) |
72.1 |
88.92 |
st. |
Ilex |
Ilex hercynica Mai |
|
Early Paleocene |
Gonna formation (Sangerhausen, Germany) |
66.0 |
82.82 |
st. |
Actinidiaceae |
Saurauia antiqua Knobloch et Mai |
|
Late Santonian |
Klikov-Schichtenfolge (Germany) |
85.8 |
102.6 |
st. |
Nymphaeales |
unnamed Nymphaeales |
|
Late Aptian-Early Albian |
Vale de Agua (Portugal) |
112.0 |
128.8 |
cr. |
Canellales |
Walkeripollis gabonensis Doyle, Hotton et Ward |
|
Late Barremian-Early Aptian |
Cocobeach (Gabon) |
125.0 |
141.8 |
st. |
Magnoliaceae |
Archaeanthus linnenbergeri Dilcher et Crane |
|
Early Cenomanian |
Dakota formation (Kansas, USA) |
100.5 |
117.3 |
cr. |
Magnoliales |
Endressinia brasiliana Mohr et Bernardes-de-Oliveira |
|
Aptian-Albian |
Crato formation (Brasil) |
112.0 |
128.8 |
cr. |
Lauraceae |
Potomacanthus lobatus Crane, Friis et Pedersen |
|
Early and Middle Albian |
Puddledock locality (Virginia, USA) |
119.0 |
135.8 |
cr. |
Arecaceae |
unnamed palms |
|
Conacian-Santonian |
Magothy formation (Maryland) |
89.8 |
106.6 |
cr. |
Musella-Ensete |
Ensete oregonense Manchester et Kress |
|
Middle Eocene |
Clarno formation (Oregon, USA) |
43.0 |
59.82 |
st. |
Zingiberaceae |
Zingiberopsis attenuata Hickey et Peterson |
|
Middle to late Paleocene |
Paskapoo formation (Alberta, Canada) |
61.6 |
78.42 |
cr. |
Zingiberales |
Spirematospermum chandlerae Friis |
|
Santonian-Campanian |
Neuse River formation (North Carolina, USA) |
83.6 |
100.4 |
cr. |
Araceae |
Mayoa portugallica Friis, Pedersen et Crane |
|
Barremanian-Aptian |
Almargem formation (Torres Vedras, Portugal) |
125.0 |
141.8 |
cr. |
Restionaceae |
unnamed Restionaceae |
|
Maastrichtian |
Morgan Creek (Saskatchewan, Canada) |
72.1 |
88.92 |
st. |
Poaceae |
unnamed grasses |
|
Maastrichtian |
Senegal-Ivory Coast |
72.1 |
88.92 |
cr. |
Berberidaceae |
Mahonia Nutt. |
|
Middle Eocene |
Green River formation (Colorado-Utah, USA) |
47.8 |
64.62 |
cr. |
Platanaceae |
Platanocarpus brookensis Crane, Pedersen, Friis et Drinnan |
|
Early and Middle Albian |
Patapsco formation (Virginia, USA) |
112.0 |
128.8 |
st. |
Sabiales |
Insitiocarpus moravicus Knobloch et Mai |
|
Early Cenomanian |
Peruc-schichten (Czeck Republic) |
98.0 |
114.8 |
cr. |
Iteaceae |
Divisestylus brevistamineus |
|
Turonian |
Raritan formation (New Jersey) |
93.9 |
110.7 |
cr. |
Altingiaceae |
Microaltingia apocarpela |
|
Turonian |
Raritan formation (New Jersey) |
93.9 |
110.7 |
cr. |
Tilia |
Tilia vescipites Nichols et Ott |
|
Middle Paleocene |
Wind River basin (Wyoming, USA) |
61.6 |
78.42 |
cr. |
Polygonaceae |
Persicaria (L.) Mill. |
|
Paleocene |
Europe |
66.0 |
82.82 |
cr. |
Clausena |
Clausena Burm.f. |
|
Late Oligocene |
Guang River Flora (Ethiopia) |
27.36 |
44.18 |
cr. |
Malpighiales |
Paleoclusia chevalieri Crepet et Nixon |
|
Turonian |
Raritan formation (New Jersey) |
93.5 |
110.3 |
cr. |
Fagales |
Normapolles |
|
Late Cenomanian |
Europa and USA |
94.7 |
111.5 |
cr. |
Phytolaccaceae |
Coahuilacarpon phytolaccoides Cevallos-Ferriz, Estrada-Ruiz et Perez-Hernandez |
|
Late Campanian |
Cerro del Pueblo formation (Mexico) |
72.5 |
89.32 |
st. |
Juglandaceae |
Cyclocarya brownii Manchester et Dilcher |
|
Late Paleocene |
Almont and Beicegel Creek (North Dakota, USA) |
59.2 |
76.02 |
cr. |
Rosales |
unnamed Rosidae |
|
Turonian |
Raritan formation (New Jersey) |
93.9 |
110.7 |
cr. |
Betulaceae |
Endressianthus miraensis Friis, Pedersen et Schoenenberger |
|
Campanian-Maastrichtian |
Mira (Portugal) |
72.1 |
88.92 |
cr. |
Fagaceae |
Antiquacupula sulcata Sims, Herendeen et Crane |
|
Late Santonian |
Gaillard formation (Georgia, USA) |
85.8 |
102.6 |
cr. |
Salicaceae |
Pseudosalix handleyi Boucher, Manchester et Judd |
|
Middle Eocene |
Green River formation (Colorado-Utah, USA) |
53.5 |
70.32 |
cr. |
Ranunculales |
Leefructus mirus Sun, Dilcher, Wang et Chen |
|
Barremanian-Aptian |
Yixian formation (China) |
125.0 |
141.8 |
cr. |
Fabaceae |
Fabaceae sp. |
|
Early Eocene |
Buchanan clay pit (Tenessee, USA) |
56.0 |
72.82 |
cr. |
Styracaceae |
Rehderodendron stonei Vaudois-Mieja |
|
Early Eocene |
Sabals d'Anjou (France) |
56.0 |
72.82 |
cr. |
Dipterocarpaceae |
Shorea maomingensis Feng, Kodrul et Jin |
|
Late Eocene |
Huangniuling formation (Maoming Basin, China) |
37.8 |
54.62 |
cr. |
Lamiaceae |
Ajuginucula smithii Reid et Chandler |
|
Late Eocene-Early Oligocene |
Bembridge Flora (UK) |
33.9 |
50.72 |
cr. |
Theaceae s.l. |
Pentapetalum trifasciculandricus Martinez-Millan, Crepet et Nixon |
|
Turonian |
Raritan formation (New Jersey) |
93.9 |
110.7 |
cr. |
Myrsinaceae |
unnamed Myrsinaceae |
|
Middle Miocene |
Foulden Hills Diatomite (New Zealand) |
15.9 |
32.72 |
cr. |
Myrtaceae |
Tristaniandra alleyi Wilson et Basinger |
|
Middle Eocene |
Golden Grove - East Yatala Sand Pit (South Australia) |
47.8 |
64.62 |
cr. |
Lythraceae |
Decodon tiffneyi Estrada-Ruiz, Calvillo-Canadell et Cevallos-Ferriz |
|
Late Campanian |
Cerro del Pueblo formation (Mexico) |
72.5 |
89.32 |
cr. |
Ampelocissus s.l. |
Ampelocissus parvisemina Chen et Manchester |
|
Late Paleocene |
Beicegal Creek (North Dakota, USA) |
59.2 |
76.02 |
cr. |
Vitaceae |
Indovitis chitaleyae Manchester, Kapgate et Wen |
|
Maastrichtian |
Mahurzari (India) |
72.1 |
88.92 |
cr. |
Rosa |
Rosa germerensis Edelman |
|
Early Eocene |
Germer Basin Flora (Idaho, USA) |
56.0 |
72.82 |
cr. |
Prunus |
Prunus wutuensis Li, Smith, Liu, Awasthi, Yang et Li |
|
Early Eocene |
Wutu (China) |
56.0 |
72.82 |
cr. |
Myristicaceae |
Myristicacarpum chandlerae Manchester, Doyle et Sauquet |
|
Early Eocene |
London Clay (UK) |
56.0 |
72.82 |
cr. |
The molecular clock hypothesis was tested using a chi2 likelihood ratio test (
The final aligned data matrix consists of 36,101 angiosperm species. matK (incl. trnK) sequences were mined for 31,391 species (87%), whereas rbcL sequences were obtained for 26,811 (74%) species (Suppl. material
Age estimation of the large-scale angiosperm tree resulted in a dated phylogeny (Fig.
Recently,
This study is part of the HERBAXYLAREDD project (BR/143/A3/HERBAXYLAREDD), funded by the Belgian Belspo-BRAIN program axis 4. This project is supported by Plant.ID, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement N° 765000. This study is also supported by the BRAIN.be BELSPO research program AFRIFORD and by the French Foundation for Research on Biodiversity (FRB) and the Provence-Alpes-Côte d’Azur region (PACA) region through the Centre for Synthesis and Analysis of Biodiversity data (CESAB) programme, as part of the RAINBIO research project (http://rainbio.cesab.org). The authors thank Kenneth Oberlander and an anonymous reviewer for improving the manuscript.
Table S1. Accession numbers of rbcL and matK (incl. trnK) sequences of the species included in the angiosperm phylogeny (including information on genera, family and order). Newly obtained accessions are indicated with an asterisk.
Constraint input topology for RAxML analyses of all angiosperms analysed in this study (incl. outgroup taxa).
Proportion of smoothing parameters calculated for each of the 500 tree replicates
Maximum Likelihood bootstrap consensus tree. Values above the branches indicate bootstrap support. Note that the support values above order level are all artificially set at 100 because of the use of a constraint backbone.
Maximum Likelihood phylogram of 36101 angiosperm species (nexus file). Outgroup included. Blue bars indicate 95% confidence intervals.