Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Morgane Claudel (claudelmorgane@orange.fr), Cécile Brun (cecile.brun@univ-tlse2.fr), Sylvie Guillerme (sylvie.guillerme@univ-tlse2.fr)
Academic editor: Enrico Vito Perrino
Received: 08 Oct 2021 | Accepted: 10 Feb 2022 | Published: 28 Mar 2022
© 2022 Morgane Claudel, Emilie Lerigoleur, Cécile Brun, Sylvie Guillerme
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Claudel M, Lerigoleur E, Brun C, Guillerme S (2022) Geohistorical dataset of ten plant species introduced into Occitania (France). Biodiversity Data Journal 10: e76283. https://doi.org/10.3897/BDJ.10.e76283
|
|
The original dataset presented here is the result of the first near-exhaustive analysis performed on historical data concerning ten plant species introduced in and around Occitania (south-western France) since 1651. Research was carried out on the following species: Alnus incana, Buddleja davidii, Castanea sativa, Helianthus tuberosus, Impatiens glandulifera, Prunus cerasifera, Prunus laurocerasus, Reynoutria japonica, Robinia pseudoacacia and Spiraea japonica.
The data file contains 199 occurrence data exclusively based on historical observations and records made between 1651 and 2004 that were retrieved from 111 of the 640 literary sources consulted. All the records are associated with a year and 61% of them have associated spatial coordinates. Initially, the EI2P-VALEEBEE research project focused on the introduction of these species into Occitania (95 occurrences, 47.7%), but mentions found of introductions beyond this territory - mainly in metropolitan France - are also reported.
The creation of this dataset involved five stages: (1) selection of species, (2) consultation of historical sources, (3) recording of occurrences in the dataset, (4) dataset standardisation/enrichment and Darwin core mapping and (5) data publication. Quality controls were conducted at each step.
The dataset is available on the platform of the Global Biodiversity Information Facility (GBIF) at https://doi.org/10.15468/3kvaeh. It respects the internationally recognised FAIR Data Principles (Findable, Accessible, Interoperable and Reusable).
The dataset will be progressively enriched by new data during the EI2P-VALEEBEE research project and future projects on invasive plant species conducted by the team.
The introduction of alien species into a given region may be intentional, for ornamental, horticultural or agricultural purposes, but more often, it is involuntary (
The data presented here are derived from the EI2P-VALEEBEE project, co-funded by the Région Occitanie and the Maison des Sciences de l'Homme de Toulouse (cf. glossary of acronyms in Suppl. material
The issue of alien plant invasions is complex and multifactorial. It includes an important geographical dimension, since the distribution of invasive alien plants is conditioned by the variations of an environment (
The EI2P-VALEEBEE project aims to deepen our knowledge of the links between alien plant invasions and changes in ecosystem services, which are still poorly understood today (
At the scale of Occitania and, more broadly, at the French national scale, our dataset provides novelty in the consideration of the temporal and geohistorical dimension of the invasion phenomenon as it records both observations and historical and literary mentions of the species studied.
The data file is the result of a geo-historical study conducted on the introduction and distribution of invasive plant species. Ten plant species were selected that can all be observed in the Pique and Oussouet Valleys. Some of them are considered as invasive, alien species. The study includes research on the introduction dates of the species studied, the locations of their introduction, their interest and past uses, the different human perceptions of them over time, activities that have impacted their local distribution, comments from authors and observers on their abundance and elements of the historical context of their introduction. Historical sources were consulted during 2020 in order to find the oldest elements concerning the ten species.
Interest and use of the dataset
Without a historical analysis, it is difficult to understand the current local distribution dynamics of invasive plant species, especially since some of them were introduced into Metropolitan French territory several centuries ago (
The interest of the dataset is directly in line with the values of the EI2P-VALEEBEE project itself, the objective of which is to contribute to a better understanding of plant invasion processes in a transversal way (
Geohistorical dataset of ten plant species introduced in Occitania (France)
Conceptualisation, M.C. and E.L.; methodology, M.C., E.L., C.B. and S.G.; investigation, M.C.; data validation, M.C. and E.L.; writing, review and editing, M.C., E.L., C.B. and S.G.; supervision, C.B. and S.G.; project administration, S.G.; funding acquisition, S.G.
The Oussouet Valley (Pyrenean foothills, Hautes-Pyrénées) and the Pique Valley (Haute-Garonne) in Occitania (South of France).
The creation of this dataset involved a number of different stages: (1) selection of species, (2) consultation of historical sources, (3) recording of occurrences in the dataset, (4) dataset standardization/enrichment and Darwin core mapping and (5) data publication.
Step 1: Species selection.
Current field observations were made particularly in the two valleys selected in the south-west of France: the Pique Valley and the Oussouet Valley. These two territories present a high rate of plant invasions. Four exotic plant species were initially observed to provide some spatial coverage in these valleys: Buddleja davidii Franch., 1887; Impatiens glandulifera Royle, 1833; Reynoutria japonica Houtt., 1777 and Spiraea japonica L.f., 1782. In accordance with the scientific needs of the EI2P-VALEEBEE project, six species (Alnus incana (L.) Moench, 1794; Castanea sativa Mill., 1798; Helianthus tuberosus L., 1753; Prunus cerasifera Ehrh., 1784; Prunus laurocerasus L., 1753 and Robinia pseudoacacia L., 1753) were added to the selection on the basis of the main relevant criteria (see glossary for acronyms in Suppl. material
Step 2: Consultation of historical sources.
For the consultation of historical sources, a funnel method was applied:
Throughout the research process, key informants, having good knowledge of the study areas and their backgrounds, contributed information and advice on valuable literary sources, allowing the research to best fit the study areas and to be as complete as possible. It should also be noted that, in order to consult sources of all kinds, as soon as we felt that a literary source could potentially contribute elements on one of the ten species, we consulted it, even if, at first sight, it had no connection with botany (e.g. the recipe book of
As a result, 640 literary sources were consulted during this step. Amongst these, 111 (17.3%) provided information on the introduction and colonisation of the ten species over time (Suppl. material
Step 3: Occurrences recorded in the data file.
Each time a species was mentioned in the historical literature consulted, it was recorded in an occurrence data file created in LibreOffice Calc (spreadsheet programme). The file format is OpenDocument Spreadsheet (.ods). When recording an occurrence, attention was paid to the vernacular and scientific synonyms used in the historical literature. We recorded the occurrences for which the historical name is currently identified as a synonym in the Catalogue of Life, INPN and ISSG, but also according to the number of elements in the bibliographic source that allowed the taxon to be identified as such: photographic representation, image or plate of the species, precise description, mention of other known scientific and vernacular names. In the data file, a maximum of elements mentioned by the author were recorded: the synonym cited; the reference code from the French taxonomic referential TAXREF (https://inpn.mnhn.fr/programme/referentiel-taxonomique-taxref?lg=en); the bibliographic reference in which the mention of the species was found; the date of observation of the species or, failing that, the date of its mention; the names of the observers and authors; the type of source; the description of the location as soon as it was mentioned; the species' spatial coverage and abundance; its minimum and maximum altitudes; any comments by the author about the species, the location of observation or mention of the species; the nearest town/village; and the latitude and longitude coordinates. In addition, each element concerning the literature source was recorded in the same software (LibreOffice Calc): bibliographic reference number (identifier), name(s) of the author(s), title, year of publication, collection and publisher if they were mentioned, as well as the city of publication, the name of the journal (if the source was an article), the URL if accessible or the reference of the archive document with the location of the archive institution, the call number and the series.
Step 4: Dataset standardisation/enrichment and Darwin Core mapping.
The Darwin Core Standard (
Each column header of the occurrence spreadsheet was searched for an equivalent term in the Darwin Core quick reference guide (https://dwc.tdwg.org/terms/). We also chose the Identification History extension (https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Identification) to manage synonyms of taxon names as they were cited in the literature consulted.
The geographic data (longitude, latitude, WGS84 datum) were structured in two ways: 1) geographic coordinates with an accuracy of 500 metres for occurrences whose locality was precisely identified in the literary source and 2) geographic coordinates with an accuracy of 10,000 metres for occurrences whose literary source mentioned only the name of the municipality (https://www.geonames.org/). For each occurrence, the Darwin core terms “country”, “province”, “county”, “municipality” and “locality” were assigned as far as possible from the elements of the literary sources.
Step 5: Data publication.
For the publication on the GBIF platform, the Integrated Publishing Toolkit (IPT (gbif.org)) of the GBIF was used to fill out the metadata and to generate the Darwin Core Archive. The dataset is available on the GBIF platform at https://doi.org/10.15468/3kvaeh (
FAIR principles | FAIRness assessment criteria used |
---|---|
FINDABLE |
|
ACCESSIBLE |
|
INTEROPERABLE |
|
REUSABLE |
|
Several quality controls were implemented. First, data cleaning and corrections were performed with proofreading by a third party and the use of pivot tables to check data integrity. Harmonisation and standardisation of content were also necessary to allow and facilitate the mapping with Darwin Core terms. The latter made it possible to identify new additional information such as nomenclatural code, coordinate uncertainty, geodetic datum, georeference source, licence etc. As many standards as possible were chosen to describe country codes (ISO 3166-1-alpha-2), municipality names and their geographic coordinates (geonames.org), taxon scientific names (TAXREF v.13.0 - 2019-12-06) and taxon ID from several sources (GBIF, IUCN, Catalogue of Life, IPNI, INPN, Tela Botanica BDTFX). Finally, we used a GIS tool (QGIS 3.10 LTR) to check the geographic coordinates of occurrences.
Within the framework of this geohistorical study, we consulted the information on the introductions and distributions of target species existing in archival sources regarding the Oussouet Valley (Pyrenean foothills, Hautes-Pyrénées) and the Pique Valley (Haute-Garonne), in Occitania (95 occurrences, 47.7%). When information concerning other French or European territories: other parts of France (50.8%), Belgium (1%) or the UK (0.5%), was found in the documents consulted, it was also recorded. This explains the European geographical coverage. Of the 199 occurrences, only 122 (61%) occurrences are precisely geolocated (municipality or 500 metre buffer). Fig.
42°5'52.8''N and 59°15'57.6''N Latitude; 8°36'46.8''W Longitude and 8°26'16.8''E Longitude.
Ten plant species were studied: Alnus incana (L.) Moench, 1794; Buddleja davidii Franch., 1887; Castanea sativa Mill., 1798; Helianthus tuberosus L., 1753; Impatiens glandulifera Royle, 1833; Prunus cerasifera Ehrh., 1784; Prunus laurocerasus L., 1753; Reynoutria japonica Houtt., 1777; Robinia pseudoacacia L., 1753 and Spiraea japonica L.f., 1782. The scientific names of the ten species comply with the national taxonomic and nomenclatural reference system for fauna, flora and fungi in metropolitan France and overseas: TAXREF v.13 (
Phylogenetic classification of the studied taxa ordered by order, family, genus and species, with their number of occurrences and their hyperlink to the subsample by species on gbif.org.
Order | Family | Genus | Species | Common name | Number of occurrences (%) |
Asterales | Asteraceae | Helianthus | Helianthus tuberosus | Jerusalem artichoke | 17 (8.5) |
Caryophyllales | Polygonaceae | Reynoutria | Reynoutria japonica | Japanese knotweed | 21 (10.6) |
Ericales | Balsaminaceae | Impatiens | Impatiens glandulifera | Indian balsam | 53 (26.6) |
Fabales | Fabaceae | Robinia | Robinia pseudoacacia | False-acacia | 26 (13.1) |
Fagales | Betulaceae | Alnus | Alnus incana | Grey alder | 7 (3.5) |
Fagaceae | Castanea | Castanea sativa | Sweet chestnut | 7 (3.5) | |
Lamiales | Scrophulariaceae | Buddleja | Buddleja davidii | Butterfly-bush | 17 (8.5) |
Rosales | Rosaceae | Prunus | Prunus cerasifera | Cherry plum | 20 (10.1) |
Prunus laurocerasus | Cherry laurel | 20 (10.1) | |||
Spiraea | Spiraea japonica | Japanese spiraea | 11 (5.5) |
For the purposes of the historical study, all the Latin synonyms, identified and validated by the national taxonomic and nomenclatural reference frame TAXREF v.13 related to the ten taxa studied, were considered. During the analysis of historical documents, we also collected all vernacular synonyms as soon as they were associated with a validated Latin synonym (Suppl. material
The geohistorical database includes observation and record data for the ten species from 1651 until 2004, collected in 2020. The objective was to cover the different periods of introduction of these ten species in Occitania. We consulted literature dating from the 17th century, a period during which some of the species studied seem to have been introduced in France (Helianthus tuberosus, Prunus cerasifera, Prunus laurocerasus and Robinia pseudoacacia).
The number of collected occurrences increased from around 1800 until 1950, which can be explained by the introduction of four of the exotic species into Metropolitan France in the 19th century: Buddleja davidii, Impatiens glandulifera, Reynoutria japonica and Spiraea japonica and also by an increase in the number of historical sources relating to botany and horticulture, which facilitated the identification of mentions or observations of the species studied. Therefore, 75% of the historical collected data dates from about 1800 to 1950 (Fig.
The Darwin Core Standard (DwC) was used to offer a "stable, straightforward and flexible framework for compiling biodiversity data from varied and variable sources" (https://www.gbif.org/en/darwin-core). All column labels and descriptions are from https://dwc.tdwg.org/terms/.
Column label | Column description |
---|---|
id | Same as occurrenceID: An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. This is the primary key of this table. |
type | The nature or genre of the resource. |
modified | The most recent date-time on which the resource was changed. |
language | A language of the resource. |
licence | A legal document giving official permission to do something with the resource. |
rightsHolder | A person or organisation owning or managing rights over the resource. |
institutionID | An identifier for the institution having custody of the object(s) or information referred to in the record. |
institutionCode | The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record. |
datasetName | The name identifying the dataset from which the record was derived. |
ownerInstitutionCode | The name (or acronym) in use by the institution having ownership of the object(s) or information referred to in the record. |
basisOfRecord | The specific nature of the data record. Recommended best practice is to use the standard label of one of the Darwin Core classes. |
occurrenceID | An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. |
occurrenceRemarks | Comments or notes about the Occurrence. |
recordedBy | A list (concatenated and separated) of names of people, groups or organisations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first. |
occurrenceStatus | A statement about the presence or absence of a Taxon at a Location. Recommended best practice is to use this controlled vocabulary: http://rs.gbif.org/vocabulary/gbif/occurrence_status.xml |
associatedReferences | A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the Occurrence. |
eventDate | The date-time or interval when the event was recorded. Not suitable for a time in a geological context. Recommended best practice is to use an encoding scheme, such as ISO 8601:2004(E). |
year | The four-digit year in which the Event occurred, according to the Common Era Calendar. |
eventRemarks | Comments or notes about the Event. |
locationID | An identifier for the set of location information (data associated with dcterms:Location). May be a global unique identifier or an identifier specific to the dataset. |
continent | The name of the continent in which the Location occurs. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. |
countryCode | A unique (preferably globally-unique) identifier for the taxon represented in the row. Recommended best practice is to use ISO 3166-1-alpha-2 country codes: http://rs.gbif.org/vocabulary/iso/3166-1_alpha2.xml |
stateProvince | The name of the next smaller administrative region than country (state, province, canton, department, region etc.) in which the Location occurs. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. |
county | The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department etc.) in which the Location occurs. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. |
municipality | The full, unabbreviated name of the next smaller administrative region than county (city, municipality etc.) in which the Location occurs. Do not use this term for a nearby named place that does not contain the actual location. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. |
locality | The specific description of the place. Less specific geographic information can be provided in other geographic terms (higherGeography, continent, country, stateProvince, county, municipality, waterBody, island, islandGroup). This term may contain information modified from the original to correct perceived errors or to standardise the description. |
verbatimLocality | The original textual description of the place. |
minimumElevationInMetres | The lower limit of the range of elevation (altitude, usually above sea level), in metres. |
maximumElevationInMetres | The upper limit of the range of elevation (altitude, usually above sea level), in metres. |
locationAccordingTo | Information about the source of this Location information. Could be a publication (gazetteer), institution or team of individuals. |
decimalLatitude | The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive. |
decimalLongitude | The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive. |
geodeticDatum | The ellipsoid, geodetic datum or spatial reference system (SRS) upon which the geographic coordinates given in decimalLatitude and decimalLongitude are based. Recommended best practice is use the EPSG code as a controlled vocabulary to provide an SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value "unknown". |
georeferenceSources | A list (concatenated and separated) of maps, gazetteers or other resources used to georeference the Location, described specifically enough to allow anyone in the future to use the same resources. |
georeferenceRemarks | Notes or comments about the spatial description determination, explaining assumptions made in addition or opposition to those formalised in the method referred to in georeferenceProtocol. |
identifiedBy | A list (concatenated and separated) of names of people, groups or organisations who assigned the Taxon to the subject. Recommended best practice is to separate the values in a list with space vertical bar space (|) . |
dateIdentified | The date on which the subject was determined as representing the Taxon. Recommended best practice is to use a date that conforms to ISO 8601-1:2019. |
taxonID | An identifier for the set of taxon information (data associated with the Taxon class). May be a global unique identifier or an identifier specific to the dataset. |
scientificNameID | An identifier for the nomenclatural (not taxonomic) details of a scientific name. |
scientificName | The full scientific name, with authorship and date information, if known. When forming part of an Identification, this should be the name in the lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the identificationQualifier term. |
nameAccordingTo | The reference to the source in which the specific taxon concept circumscription is defined or implied - traditionally signified by the Latin "sensu" or "sec." (from secundum, meaning "according to"). For taxa that result from identifications, a reference to the keys, monographs, experts and other sources should be given. |
kingdom | The full scientific name of the kingdom in which the taxon is classified. |
phylum | The full scientific name of the phylum or division in which the taxon is classified. |
class | The full scientific name of the class in which the taxon is classified. |
order | The full scientific name of the order in which the taxon is classified. |
family | The full scientific name of the family in which the taxon is classified. |
genus | The full scientific name of the genus in which the taxon is classified. |
taxonRank | The taxonomic rank of the most specific name in the scientificName. |
vernacularName | A common or vernacular name. |
taxonRemarks | Comments or notes about the taxon or name. |
The Darwin Core Identification History is an extension allowing multiple identification/determinations of species occurrences, particularly name spellings found in each original text. All identifications including the current one are listed, while the current should also be repeated in the occurrence core for simple access (Source: https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Identification). All column labels and descriptions are from https://dwc.tdwg.org/terms/.
Column label | Column description |
---|---|
id | An identifier for the Occurrence linked to the occurrence.txt file (same as occurrenceID). It can be repeated as a foreign key here. |
identificationID | A unique identifier corresponding to the name spelling reported as found in the original text. This is the primary key of this table. |
dateIdentified | The date on which the subject was determined as representing the Taxon. Recommended best practice is to use a date that conforms to ISO 8601-1:2019. The date format here is YYYY (e.g. 1694). |
scientificName | The full scientific name, with authorship and date information, if known. When forming part of an Identification, this should be the name in the lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the identificationQualifier term. |
nameAccordingTo | The reference to the source in which the specific taxon concept circumscription is defined or implied - traditionally signified by the Latin "sensu" or "sec." (from secundum, meaning "according to"). For taxa that result from identifications, a reference to the keys, monographs, experts and other sources should be given. |
vernacularName | A common or vernacular name. |
taxonRemarks | Comments or notes about the taxon or name. |
Maintenance and future work
All data will be maintained by their creators. They will be progressively enriched by new data during the current EI2P-VALEEBEE research project and also during the projects that the team will continue to conduct on invasive plant species thereafter.
The dataset is already archived and published through GBIF:
https://www.gbif.org/dataset/345820cc-a0a8-4d76-b7eb-fba85b21ad08
It will be regularly updated and versioned through GBIF.
Project title: EI2P - Espèces invasives et pollinisateurs, entre contraintes et potentiels | VALEEBEE - VALorisation des Espèces exotiques Envahissantes et Abeilles
Funding: Région Occitanie - Appel à projets Recherche et Société(s) 2019 | Maison des Sciences de l'Homme et de la Société de Toulouse (MSH-T) APEX 2020
This work is endorsed by the CNRS/INEE Zone Atelier Pyrénées Garonne (ZA PYGAR). The Zones Ateliers network (RZA) is recognized by ALLENVI, as an ESFRI eLTER (European Long-Term Ecological Research) “Integrated European Long-Term Ecosystem, Critical Zone & Socio-Ecological System Research Infrastructure”.
We appreciated the support and precious help of Sophie Pamerlon from GBIF-France who gave her time to enlighten us on Darwin Core mapping and the use of the Integrated Publishing Toolkit (https://www.gbif.org/ipt). We also warmly thank the technical evaluator of the data for his valuable advice.