Corresponding author: Steven B. Janssens (
Academic editor: Stephen Boatwright
Phylogenies are a central and indispensable tool for evolutionary and ecological research. Even though most angiosperm families are well investigated from a phylogenetic point of view, there are far less possibilities to carry out large-scale meta-analyses at order level or higher. Here, we reconstructed a large-scale dated phylogeny including nearly 1/8th of all angiosperm species, based on two plastid barcoding genes,
During the past two decades, awareness has grown that ecological and evolutionary studies benefit from incorporating phylogenetic information (
There is currently an ongoing quest to optimise the methodology for constructing large-scale mega-phylogenies that can be used for further ecological and evolutionary studies. This is done by either mining and analysing publicly available DNA sequences (
In 2009, the Consortium for the Barcode of Life working group (CBOL) advised sequencing of the two plastid markers
We extracted angiosperm sequence data of
A modified CTAB protocol was used for total genomic DNA isolation (
Amplification reactions of
We are aware that the publicly available database, GenBank, contains a large amount of erroneous data (
For sequence fragments that are protein-encoded, comparison of amino acid (AA) sequences, based on the associated triplet codons between taxa, was applied. As a result, taxa with a sudden shift in AA or frame shift were discarded from the dataset.
Alignment was carried out in multiple stages. Due to our large angiosperm-wide dataset, an initial alignment (automatically and manually) was conducted for each order included in the dataset. Subsequently, the different alignments were combined using the Profile alignment algorithm (Geneious v11.0, Auckland, New Zealand). The initial automatic alignment was conducted with MAFFT (
The best-fit nucleotide substitution model for both
Support values for the large angiosperm dataset were obtained via the rapid bootstrapping algorithm as implemented in RAxML 7.4.2 (
Evaluation of fossil calibration points was carried out following the specimen-based approach for assessing paleontological data by
The molecular clock hypothesis was tested using a chi2 likelihood ratio test (
The final aligned data matrix consists of 36,101 angiosperm species.
Age estimation of the large-scale angiosperm tree resulted in a dated phylogeny (Fig.
Recently,
This study is part of the HERBAXYLAREDD project (BR/143/A3/HERBAXYLAREDD), funded by the Belgian Belspo-BRAIN program axis 4. This project is supported by Plant.ID, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement N° 765000. This study is also supported by the BRAIN.be BELSPO research program AFRIFORD and by the French Foundation for Research on Biodiversity (FRB) and the Provence-Alpes-Côte d’Azur region (PACA) region through the Centre for Synthesis and Analysis of Biodiversity data (CESAB) programme, as part of the RAINBIO research project (
Maximum Likelihood-based angiosperm phylogram based on the combined
List of fossils used as calibration points, including their oldest stratigraphic occurrence, minimum and maximum ages, the calibrated clades and used references. cr.=crown, st.=stem.
|
|
|
|
|
|
|
|
|
|
Late Eocene | Anglesea formation (Victoria, Australia) | 37.8 | 54.62 | cr. | |
|
|
Middle Eocene | Messel formation (Darmstadt, Germany) | 47.8 | 64.62 | cr. | |
|
|
Early Conacian | Ashizawa formation, Futuba group (North-eastern Honshu, Japan) | 89.8 | 106.6 | cr. | |
|
|
Late Eocene-Early Oligocene | Bembridge Flora (UK) | 33.9 | 50.72 | st. | |
|
|
Middle Eocene | Claiborne formation (Tennessee, USA) | 47.8 | 64.62 | st. | |
|
|
Oligocene | Fraser River system (British Colombia, Canada) | 27.8 | 44.62 | st. | |
|
Early Eocene | Laguna del Hunco (Chubut, Patagonia, Argentina) | 52.0 | 68.82 | st. | ||
|
|
Late Miocene | Europe | 11.6 | 28.42 | st. | |
|
|
Middle Eocene | Middle Eocene Republic Flora (Washington, USA) | 47.8 | 64.62 | st. | |
|
|
Middle Eocene | Gatuncillo formation (Panama) | 47.8 | 64.62 | st. | |
|
|
Maastrichtian | Eisleben formation (Germany) | 72.1 | 88.92 | st. | |
|
|
Early Paleocene | Gonna formation (Sangerhausen, Germany) | 66.0 | 82.82 | st. | |
|
|
Late Santonian | Klikov-Schichtenfolge (Germany) | 85.8 | 102.6 | st. | |
|
|
|
Late Aptian-Early Albian | Vale de Agua (Portugal) | 112.0 | 128.8 | cr. |
|
|
Late Barremian-Early Aptian | Cocobeach (Gabon) | 125.0 | 141.8 | st. | |
|
|
Early Cenomanian | Dakota formation (Kansas, USA) | 100.5 | 117.3 | cr. | |
|
|
Aptian-Albian | Crato formation (Brasil) | 112.0 | 128.8 | cr. | |
|
|
Early and Middle Albian | Puddledock locality (Virginia, USA) | 119.0 | 135.8 | cr. | |
|
unnamed palms | Conacian-Santonian | Magothy formation (Maryland) | 89.8 | 106.6 | cr. | |
Musella- |
|
Middle Eocene | Clarno formation (Oregon, USA) | 43.0 | 59.82 | st. | |
|
|
Middle to late Paleocene | Paskapoo formation (Alberta, Canada) | 61.6 | 78.42 | cr. | |
|
|
Santonian-Campanian | Neuse River formation (North Carolina, USA) | 83.6 | 100.4 | cr. | |
|
|
Barremanian-Aptian | Almargem formation (Torres Vedras, Portugal) | 125.0 | 141.8 | cr. | |
|
unnamed |
|
Maastrichtian | Morgan Creek (Saskatchewan, Canada) | 72.1 | 88.92 | st. |
|
unnamed grasses |
|
Maastrichtian | Senegal-Ivory Coast | 72.1 | 88.92 | cr. |
|
|
Middle Eocene | Green River formation (Colorado-Utah, USA) | 47.8 | 64.62 | cr. | |
|
|
Early and Middle Albian | Patapsco formation (Virginia, USA) | 112.0 | 128.8 | st. | |
|
|
Early Cenomanian | Peruc-schichten (Czeck Republic) | 98.0 | 114.8 | cr. | |
|
|
|
Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
|
|
|
Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
|
|
Middle Paleocene | Wind River basin (Wyoming, USA) | 61.6 | 78.42 | cr. | |
|
|
Paleocene | Europe | 66.0 | 82.82 | cr. | |
|
|
Late Oligocene | Guang River Flora (Ethiopia) | 27.36 | 44.18 | cr. | |
|
|
Turonian | Raritan formation (New Jersey) | 93.5 | 110.3 | cr. | |
|
|
Late Cenomanian | Europa and USA | 94.7 | 111.5 | cr. | |
|
|
Late Campanian | Cerro del Pueblo formation (Mexico) | 72.5 | 89.32 | st. | |
|
|
Late Paleocene | Almont and Beicegel Creek (North Dakota, USA) | 59.2 | 76.02 | cr. | |
|
|
|
Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
|
|
Campanian-Maastrichtian | Mira (Portugal) | 72.1 | 88.92 | cr. | |
|
|
Late Santonian | Gaillard formation (Georgia, USA) | 85.8 | 102.6 | cr. | |
|
|
Middle Eocene | Green River formation (Colorado-Utah, USA) | 53.5 | 70.32 | cr. | |
|
|
Barremanian-Aptian | Yixian formation (China) | 125.0 | 141.8 | cr. | |
|
|
Early Eocene | Buchanan clay pit (Tenessee, USA) | 56.0 | 72.82 | cr. | |
|
|
Early Eocene | Sabals d'Anjou (France) | 56.0 | 72.82 | cr. | |
|
|
Late Eocene | Huangniuling formation (Maoming Basin, China) | 37.8 | 54.62 | cr. | |
|
|
Late Eocene-Early Oligocene | Bembridge Flora (UK) | 33.9 | 50.72 | cr. | |
|
Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. | ||
|
|
|
Middle Miocene | Foulden Hills Diatomite (New Zealand) | 15.9 | 32.72 | cr. |
|
|
Middle Eocene | Golden Grove - East Yatala Sand Pit (South Australia) | 47.8 | 64.62 | cr. | |
|
|
Late Campanian | Cerro del Pueblo formation (Mexico) | 72.5 | 89.32 | cr. | |
|
Late Paleocene | Beicegal Creek (North Dakota, USA) | 59.2 | 76.02 | cr. | ||
|
|
Maastrichtian | Mahurzari (India) | 72.1 | 88.92 | cr. | |
|
|
Early Eocene | Germer Basin Flora (Idaho, USA) | 56.0 | 72.82 | cr. | |
|
|
Early Eocene | Wutu (China) | 56.0 | 72.82 | cr. | |
|
|
Early Eocene | London Clay (UK) | 56.0 | 72.82 | cr. |
Supplementary Table
Species list
Table S1. Accession numbers of
File: oo_329737.xlsx
Constraint input topology
Constraint topology
Constraint input topology for RAxML analyses of all angiosperms analysed in this study (incl. outgroup taxa).
File: oo_362680.tre
Proportion of smoothing parameters
graph
Proportion of smoothing parameters calculated for each of the 500 tree replicates
File: oo_363086.pdf
Angiosperm phylogeny - ML bootstrap values
phylogeny
Maximum Likelihood bootstrap consensus tree. Values above the branches indicate bootstrap support. Note that the support values above order level are all artificially set at 100 because of the use of a constraint backbone.
File: oo_329452.tre
Dated angiosperm phylogram
phylogeny
Maximum Likelihood phylogram of 36101 angiosperm species (nexus file). Outgroup included. Blue bars indicate 95% confidence intervals.
File: oo_330891.tre