Biodiversity Data Journal :
Research Article
|
Corresponding author: Mai-Phuong Pham (maiphuong.vrtc@gmail.com), Duy Dinh Vu (duydinhvu87@gmail.com)
Academic editor: Quentin Groom
Received: 05 Mar 2024 | Accepted: 26 Apr 2024 | Published: 16 May 2024
© 2024 Mai-Phuong Pham, Duy Dinh Vu, Thanh Tuan Nguyen, Van Sinh Nguyen
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Pham M-P, Vu DD, Nguyen TT, Nguyen VS (2024) Predictive ecological niche model for Cinnamomum parthenoxylon (Jack) Meisn. (Lauraceae) from Last Glacial Maximum to future in Vietnam. Biodiversity Data Journal 12: e122325. https://doi.org/10.3897/BDJ.12.e122325
|
Cinnamomum parthenoxylon (Jack) Meisn. is a tree in genus Cinnamomum that has been facing global threats due to forest degradation and habitat fragmentation. Many recent studies aim to describe habitats and assess population and species genetic diversity for species conservation by expanding afforestation models for this species. Understanding their current and future potential distribution plays a major role in guiding conservation efforts. Using five modern machine-learning algorithms available on Google Earth Engine helped us evaluate suitable habitats for the species. The results revealed that Random Forest (RF) had the highest accuracy for model comparison, outperforming Support Vector Machine (SVM), Classification and Regression Trees (CART), Gradient Boosting Decision Tree (GBDT) and Maximum Entropy (MaxEnt). The results also showed that the extremely suitable ecological areas for the species are mostly distributed in northern Vietnam, followed by the North Central Coast and the Central Highlands. Elevation, Temperature Annual Range and Mean Diurnal Range were the three most important parameters affecting the potential distribution of C. parthenoxylon. Evaluation of the impact of climate on its distribution under different climate scenarios in the past (Last Glacial Maximum and Mid-Holocene), in the present (Worldclim) and in the future (using four climate change scenarios: ACCESS, MIROC6, EC-Earth3-Veg and MRI-ESM2-0) revealed that of C. parthenoxylon would likely expand to the northeast, while a large area of central Vietnam will gradually lose its adaptive capacity by 2100.
Cinnamomum parthenoxylon, GEE, habitat, machine learning, niche model
Global climate change and substantial illegal harvesting have been highly intricate and unpredictable phenomena, posing potential risks to human life, flora, fauna and the environment. The degradation of forest ecosystems has jeopardised the existence of various species in nature (
C. parthenoxylon was initially scientifically described by Karl Friedrich Meisner (Meisn.) in 1864. This species belongs to Cinnamomum genus, which is naturally distributed in Cambodia; China (Guizhou, Hainan, Yunnan, Hunan, Fujian, Jiangxi, Guangdong, Sichuan, Guangxi); Indonesia (Sumatera, Kalimantan, Jawa); Lao People's Democratic Republic; Malaysia; Myanmar; Thailand; and Vietnam (IUCN). In Vietnam, it is found in provinces in the North, North Central and some Southern provinces, with a wide distribution range from 50-1500 m in elevation across various types of forests, including planted forests, production forests, natural forests and even shifting cultivation areas. The species exhibits strong regenerative capabilities (
Trees reach maturity to 20 to 25 years of age, with a breast height diameter ranging from 30 to 35 cm and a height of 20 to 25 m (
One of the imperatives for ecologists is to identify conservation solutions for species. In the realm of biodiversity conservation, Ecological Niche Models (ENMs) have emerged as a primary method for modelling the distribution of species within a geographic area (
Amongst the primary types of ENMs, correlation models remain the most widely utilised in ecological and evolutionary population characteristic studies, as well as in predicting the future climate adaptation range of species populations. The correlation model assesses the potential relationships between environmental predictor factors (such as: climate data (
The potential applications of ENM techniques have spurred researchers to implement these methods across various platforms. For instance, the R package (
In this study, five different machine-learning algorithms (MaxEnt, Random Forest - RF, Gradient boosting Decision Tree - GBDT, Support Vector Machine - SVM and Classification and Regression Tree – CART) were used to model the potential distribution of C. parthenoxylon based on data collected from 117 species occurrence points coupled with 20 environmental variables. Ultimately, we aimed to identify the most suitable machine-learning algorithms for constructing an ENM for this species through the habitat suitability index (HSI) (
Understanding the characteristics of forestry ecological zones is important to the development of sustainable management strategies. This ensures that forestry exploitation is conducted in a balanced manner, minimising significant environmental and natural resource losses (
Geographical distribution of sampling points of C. parthenoxylon in Vietnam. Study area (a); adult plant (b). (Zone 1: Red River Delta; Zone 2: North East; Zone 3: North West; Zone 4: North Central Coast; Zone 5: South Central Coast; Zone 6: Central Highlands; Zone 7: South-East; Zone 8: Mekong River Delta (This map does not show offshore islands).
We selected the environmental parameters based on their frequent applicability and ecological significance in ENM for species conservation. Finally, three sets of environmental parameters have been chosen for utilisation to predict the ENM of C. parthenoxylon at present, including:
To forecast the future ENM of C. parthenoxylon, we utilied four climate change scenarios, including:
These scenarios were applied to models covering the periods 2061-2080 and 2081-2100. Four emission scenarios corresponding to shared socioeconomic pathways (SSP126, SSP245, SSP370 and SSP585) were considered, as provided by CMIP6 with net radiative forcing values of 2.6, 4.5, 7.0 and 8.5 W/m² (
To predict the historical ENM of C. parthenoxylon, we employed two paleoclimate datasets downloaded from paleoclim.org, version 1.4:
We applied five distinct machine-learning algorithms in Google Earth Engine (GEE): Random Forest - RF (
The Random Forest algorithm serves as an ensemble machine-learning approach applicable to both classification and regression tasks. It operates by assembling multiple decision trees during the training phase and generates outcomes in the form of mode (for classification) or average prediction (for regression) based on the individual trees (
SVM is a supervised machine-learning algorithm introduced by
The research employed the Gradient Boosting Decision Tree (GBDT) machine-learning algorithm, a recursive decision-tree method consisting of multiple decision trees (
The CART algorithm divides the n-dimensional space into rectangles that do not overlap each other by recursion (
GINI x (D) = 1 - \(?i\) = kpi2 (1),
in which k denotes the count of distinct sample types and pi signifies the probability of classifying a sample into type i. A lower GINI value indicates higher sample quality and improved sorting effectiveness. The decision tree comprises multiple levels of nodes and leaves. The term "maximum nodes" pertains to the highest number of leaves achievable per plant, while the "minimum leaf population" is the smallest number of nodes generated exclusively for training purposes. To construct an appropriate tree, enough nodes and branches must be generated. The maximum node value has no upper limit unless explicitly specified.
Species Distribution Models (SDMs) are currently applied in various popular applications, including the modelling of bioclimatic conditions, defining environmental envelopes, conducting climate change experiments, employing genetic algorithms for rule-set production and utilising MaxEnt for shaping tissues (maximum entropy). Amongst SDMs, the MaxEnt model is prioritised due to its outstanding advantages, such as requiring only current species data as input. It accurately constructs spatial environment maps suitable for the species and assesses the importance of environmental variables in species distribution. The MaxEnt model can simultaneously incorporate both continuous and discrete variables as input data. This model has been widely used in habitat zoning for the conservation of various plant species worldwide (
To understand the representation of the realised distribution of the species by p, we should examine the following sampling approach. An observer randomly selects a site, denoted as x, from the set X comprising sites within the study area. The observer records 1 if the species is present at x and 0 if it is absent. If we designate the response variable (presence or absence) as y, then p(x) represents the conditional probability P(x|y = 1), indicating the likelihood of the observer being at x given that the species is present. Applying Bayes' rule (2):
(P(y=1|x)=(P(x|y=1)P(y=1))/(P(x))= p(x)P(y=1)|x|) (2)
According to our sampling strategy, P(x) = 1/|X| for all x. In this context, P(y = 1) represents the overall prevalence of the species in the study area. The quantity P(y = 1|x) is the probability of the species being present at the location x, taking values of 0 or 1 for plants, but potentially ranging from 0 to 1 for vagile organisms (
The accuracy of the models is based on validation sets for each model iteration. The first metrics are the threshold-independent areas under the ROC curve (AUC-ROC). AUC-ROC ranges from 0 to 1, where 1 signifies perfect discrimination between true positive and false positive instances. Similar evaluations using AUC-ROC have been extensively detailed in the study of
The Habitat Suitability Index (HSI) is an index that represents ENM through a digital map. The output from each model in various periods generates a Habitat Suitability Index (HSI) map. HSI is the result file in the last step on GEE. Then, it will be exported from GEE to Google Drive. Finally, the data will be imported into QGIS 3.22 for classification using a five-category habitat suitability index for C. parthenoxylon. Extreme suitable (HSI > 0.8), high suitable (HSI: 0.7 – 0.8), moderate suitable (HSI: 0.6 – 0.7), moderate-low suitable (HSI: 0.4 - 0.6), low or unsuitable (HSI: <0.4).
The evaluation results of the accuracy of species distribution models generated by five machine-learning algorithms for the validation dataset indicate that the Random Forest (RF) algorithm achieved the highest accuracy with an AUC-ROC value of 0.88. The following RF, GBDT, CART, MaxEnt and SVM algorithms demonstrated accuracies of 0.86, 0.82 and 0.68, respectively. Consistent with these findings, the RF algorithm also exhibited superiority over eight other machine-learning algorithms (SVM, GARP, DT, RIPPER, KNN, Logistic, ANN and NativeBayes) when constructing distribution models for plenty plant species in the Latin American Region, achieving AUC accuracies ranging from 0.82 to 0.96 (
Random forest (RF) has emerged as a valuable methodology in the academic realm for modelling plant and animal habitats, as well as for monitoring alterations in land use, encompassing shifts in forest cover, land degradation and urban expansion. Moreover, its utility extends to the domain of natural disaster forecasting (
The results obtained showed that C. parthenoxylon was naturally distributed in Vietnam, primarily in the northern regions, specifically Zones 2 and 3, North Central Vietnam (Zones 4 and 5) and the Central Highlands (Zone 6). According to
The simulation results using the Random Forest algorithm revealed areas classified as extremely high suitability, high suitability, medium suitability, medium-low suitability and low or unsuitable for C. parthenoxylon, covering 48,371.48 km2, 54,225.77 km2, 38,838.62 km2, 42,950.08 km2 and 137,384.93 km2, respectively. These areas correspond to 15%, 16.8%, 12%, 13.3% and 42.7% of the total area of Vietnam. Amongst them, the highest suitability area is in Zone 3, followed by Zones: 2, 4, 6, 1, 5, 7 and 8, with the average of Habitat Suitability Index (HSI) values decreasing in the following values: 0.72, 0.61, 0.39, 0.36, 0.35, 0.24, 0.12 and 0.12 (Fig.
The ecologically suitable areas for species have exhibited significant fluctuations from the Last Glacial Maximum (LGM) period to the present, particularly demonstrating erratic changes in the most extremely suitable regions, notably in northern Vietnam. In this geographical area, there was a considerable loss of suitable habitat from the LGM to the Mid-Holocene (MH) period, with slight expansion from the MH period to the present (Fig.
The model incorporates two climate change scenarios to assess two time periods for the species distribution: 2061–2080 and 2080–2100, under the best emission scenario SSP126 and the worst emission scenario SSP585. Fig. 5 demonstrates that the ACCESS and EC scenario distinctly depict a diminishing trend in ecologically suitable areas, particularly pronounced in the Central Highlands (Zone 6) and gradually declining towards the northern regions (Zones 1, 2 and 3). Generally, Zones 3, 4, 5 and 6 are the most affected by climate change. In contrast, the dynamics of suitable area changes observed when utilising the MIROC6 and MRI scenario is insignificant (Fig.
Changes in the distribution of territories, based on predictive HSI across future periods in Vietnam. Three types of data were used to prepare the 35 HSI maps: two datasets simulating past climate (LGM and MH), four datasets of future climate scenarios (ACCESS, MROC6, EC and MRI) corresponding to four emission sets (SSP 126, 245, 370 and 585) for 2080 and 2100 and one current climate dataset. This chart produces the statistics for the suitable area of this species for the environment through 35 horizontal bars representing the proportion of suitable area in the total area of Vietnam. The results of each horizontal bar are divided into four different suitability levels: extremely suitable zone (green), highly suitable zone (light blue), high-moderate zone (yellow), moderate (orange), low or unsuitable (grey).
Projected changes in extremely suitable areas of C. parthenoxylon in Vietnam during the Mid Holocene (MH) compared with Last Glacial Maximum (LGM) (a); in the present compared with Mid Holocene (MH) (b); in the future in agreement with ACCESS scenario, SSP-126 for 2100) compared with the present (c).
A pattern of declining suitable habitats was observed for C. parthenoxylon from LGM to the MH period. The total suitable area had lost nearly 50% compared to the expanded suitable area (Fig.
Comparing four climate change scenarios showed that the ACCESS scenario emerged as depicting the most pronounced future decline in distribution area of C. parthenoxylon. Consequently, this section focuses on the separate evaluation of the ACCESS scenario for eight ecoregions in Vietnam. The findings revealed a concentration of lost suitable areas in central Vietnam across Zones 3–6, with a minor increase in suitable areas noted in Zones 2 and 3 (Fig.
In the forthcoming period, elevated precipitation and temperature will likely lead to an expansion of the species' suitable habitat towards the northeast, the northeast also being the most suitable habitat during the LGM period. This underscores the species' high sensitivity to various extreme climatic factors. Through an analysis of the determinants influencing species distribution, it is evident that elevation was the most important parameter influencing the distribution of C. parthenoxylon in Vietnam, followed by bio07 (annual temperature range) and bio02 (mean diurnal range) emerged as pivotal climatic variables influencing the redistribution of suitable zones, which may either expand or contract in the future (Fig.
The proportion of significant parameters influencing the distribution of C. parthenoxylon in Vietnam. (bio07: annual temperature range; bio02: mean diurnal range; bio15: precipitation seasonality; bio12: annual precipitation; bio14: precipitation of driest month; bio19: precipitation of coldest quarter; bio09: mean temperature of driest quarter; bio13: precipitation of wettest month; bio18: precipitation of warmest quarter; bio03: isothermality).
Our results align well with several studies worldwide and in Vietnam, demonstrating the potential of Google Earth Engine (GEE) to provide timely and high-performance species distribution models. Moreover, the models can integrate multiple parameters available on publicly accessible cloud-based data (
A primary concern in evolutionary and ecological studies involves the factors influencing and sustaining the geographic distribution of a species. This study revealed that elevation significantly influences species distribution. Additionally, variables that may explain species' climatic requirements are two temperature-related variables, namely annual temperature range bio7 (contributing significantly at 12.56%) and mean diurnal range bio2 (9.08%). Temperature fluctuations over the year (bio7) and month (bio2) typically represent highland and temperate climate characteristics. Previous literature has demonstrated that low temperatures negatively impact the emergence and mortality of seedlings within the genus Cinnamomum (
Various factors can influence the size of an ecological niche, such as recent human activities, geographical barriers and biological interactions (parasites, predators or competitors), which may be overlooked when predicting potential geographic distributions (
In this study, our limitation is that the models were assessed for accuracy by only AUC-ROC. AUC-ROC is considered suitable for extensive research areas with abundant species data and it often provides high accuracy, even when dealing with a small and restricted sample size (
Our main goal was to look at how to make Ecological Niche Models (ENMs) using common techniques in the Google Earth Engine (GEE) platform. We used five different machine-learning algorithms: Random Forest (RF), Support Vector Machine (SVM), Classification and Regression Trees (CART), Gradient Boosting Decision Tree (GBDT) and Maximum Entropy (MaxEnt). The outcomes revealed that RF exhibited superior predictive accuracy in comparison with another algorithms. In addition, our study looked at four different climate change scenarios: ACCESS, MIROC6, EC-Earth3-Veg and MRI-ESM2-0. These scenarios had different levels of emissions, ranging from the most optimistic (SSP-126) to the most pessimistic (SSP-585). Our findings elucidated that the ACCESS scenario delineated a discernible trajectory of diminishing potential suitable habitats within the confines of Vietnam. Notably, notwithstanding this reduction, pockets of highly suitable areas persisted and even expanded towards the north-eastern regions of the country in light of future projections.
This research is funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 106.06- 2021.02. The first author was funded by the Ph.D. Scholaship Programme of Vingroup Innovation (VINIF), code VINIF.2023.TS.088.